This recipe will run through the process of adding an entirely new feed of data using TheonCoupler. This recipe will use a checked out working copy so the changes can be committed.
This recipe can be run from anywhere on a machine with the necessary packages installed and suitable configuration of PostgreSQL (see configuration recipe).
(1) Create a local management database.
ttkm create aok
Note that this will delete entirely and re-create the management database if it already exists. Skip this step when you keep a persistent copy around.
(2) Checkout an uptodate version of the ModelLibrary
.
svn checkout https://svn.theon.inf.ed.ac.uk/svn/theon/branches/alpha-schema-management/hypatia/library/hypatia /tmp/hypatia cd /tmp/hypatia
(3) Build the reload data for the working copy version.
THEON_MODEL_XSLT_PATH=/usr/share/theon/self/xslt ttkm derive ddldata
(4) Reload the model.
ttkm reload
This reloads may take a while depending on the size of the model.
(5) Start the UI server so we can edit the content of the management database
to configure the TheonCoupler Stream
and Couple
.
ttkm self server ui
(6) Take the following steps after connecting to TheonUI on a browser
running on localhost at this URL: http://localhost:8000/ui.html
:
This feed takes the form of a text data file sent from another system, so a table needs to be added that the external content can be loaded into.
Open the "Dictionary" desktop. On the first panel ("Choose Model") ensure the right model is shown and then open the "Entity" panel. Click to Add a new record and just enter the following values leaving the rest of the entry fields blank: Name[Realised] Type bis_apt_feed_3g Table Query for the added record. Then add the following rows in the "Attributes" box: Position Name[Realised] Type Content 1 matric Column Text 2 prog_of_study_code Column Text 3 matric_instance Column Text 4 course_code Column Text 5 course_occurence Column Text 6 course_delivery Column Text 7 course_year Column Text 8 course_school Column Text 9 course_name Column Text 10 course_mark Column Text 11 course_result Column Text 12 mark_status Column Text
Now define the actual Stream
that will load the table.
Open the "Streaming" desktop. On the first panel ("Choose Model") ensure the right model is shown and then add a new row in the "Streams" box: Stream-Label = apt Origin = APT ? Feed Entity = bis_apt_feed_3g Source = Pipe Name[Info] = apt Open the "Stream" panel and ensure "apt" is shown (by query or row selection) and then add one new row into the "Couples" box: Position Source Entity Target Entity Name[Info] 1 bis_apt_feed_3g course_mark_3g apt Open the "Couple" panel and enter and save the following for the couple: Source Origin Value = 'APT' Then add two new rows in the "Bonds" box: Position Type 1 Handle 2 Column Open the "Bond" panel and add rows in the "Pairs" box for the couple as below: For "Handle": Position Source Column Target Column Source Value (SQL) 1 matric euclid_student_code 'S'||@ 2 prog_of_study_code euclid_programme_code 3 course_code euclid_course_code 4 course_delivery delivery 5 course_occurence occurence 6 course_year euclid_year_code REGEXP_REPLACE(@,'^(...)([\d])/([\d])$','\1\2/\1\3') For "Column": Position Source Column Target Column Source Value (SQL) 1 course_mark uo_mark (CASE WHEN @ = 'Absent' THEN NULL ELSE @ END)::INTEGER 2 course_result uo_grade
In the bond handle above some munging is done as the feed data is not quite in the correct format to match existing handle data in the target table. An alternative to this would have been to define an intermediate view that could then be the couple source table.
Control-C the server process.
(7) Now export the changes and rebuild the necessary DDL that can then be applied to the database. We will do this piecemeal as an example of that process, in practice it is simpler (better) to do a pg_dump of the database content (using insert as a reload method will work better across changes but be a lot slower), re-create the database from scratch and then do a pg_restore (assuming the changes will not affect the restore, which they shouldn’t if additive).
ttkm export
First the new feed source table.
ttkm derive ddl from relation:bis_apt_feed_3g
For a table we should apply M-2TA_TABLENAME.sql
where M
is
indeterminate but irrelevant. So in this case:
(echo "BEGIN;" cat derived/model/ddl/*-2TA_*.sql; echo "ROLLBACK;") \ | psql infdb
Where the final psql infdb
clause is appropriate to update the relevant
physical database. Where practical this should always first be done against a
development instance rather than the live production system.
If it applied correctly replace ROLLBACK
with COMMIT
and run again.
Now stream meta data changes to the target table need to be applied.
ttkm derive ddl from relation:course_mark_3g
In this case its just one additional column as the table is already used in other streams and we aren’t maintaining the content of columns which are not already in another feed, so:
(echo "BEGIN;" awk -F, '/_apt_/{print"ALTER TABLE course_mark_3g ADD COLUMN "$1";"}' derived/model/ddl/*-2TA_course_mark_3g.sql echo "ROLLBACK;") \ | psql infdb
Where the final psql infdb
is appropriate to update the relevant physical
database. Where practical this should always first be done against a
development instance rather than the live production system.
If it applied correctly replace ROLLBACK
with COMMIT
and run again.
Now derive the TheonCoupler related changes.
ttkm derive ddl
It is not safe to derive from just stream:apt
in this case as some of the
functionality is shared with other streams and an isolated derivation will lose
their content.
For a stream we should apply in order the N-1ST_STREAM.sql
, the
N-2CO_TARGET.sql
file (where N
is indeterminate but must be the same for
both 1ST
and 2CO
) and the M-1CC_TARGET.sql
file (where M
is
indeterminate but irrelevant). So in this case:
N=`ls derived/model/ddl/*-1ST_apt.sql | awk -F- '{print$1}'` (echo "BEGIN;" cat ${N}-1ST_apt.sql \ ${N}-2CO_course_mark_3g.sql \ derived/model/ddl/*-1CC_course_mark_3g.sql; echo "ROLLBACK;") \ | psql infdb
Where the final psql infdb
is appropriate to update the relevant physical
database. Where practical this should always first be done against a
development instance rather than the live production system.
In this case because other streams exist for the target table then trigger
creation errors will occur, but these can be safely ignored. If it otherwise
seems to apply correctly remove BEGIN
and ROLLBACK
and run again.
(8) Rebuild the TheonCoupler configuration.
ttkm derive coupler
(9) Test the new feed. We will do this in stages in case of errors and so we don’t risk any data damage.
First load the feed table.
ttkm stream apt refresh from DATAFILE couple none
Here DATAFILE
is the path to the feed data file.
Now run the couple manually first under a transaction to test, the name can
be found in the derived/coupler/apt.cpl
file.
psql infdb -c "BEGIN; SELECT _th_apt_course_mark_3g_couple(); ROLLBACK;"
If above worked without error next check after the SELECT
that the content
has been altered to expectation. Then do final test by running stream fully.
ttkm stream apt refresh from DATAFILE
(10) Commit the changes.
First add the new elements representing the feed table, couple function and stream definition, so in this case:
svn add schemat/relation/bis_apt_feed_3g.xsd svn add schemat/process/_th_apt_course_mark_3g_couple.xsd svn add schemat/stream/apt.xsd
Then commit these and the modified objects (the target table):
svn commit schemat/relation/bis_apt_feed_3g.xsd \ schemat/process/_th_apt_course_mark_3g_couple.xsd \ schemat/stream/apt.xsd \ schemat/relation/course_mark_3g.xsd
(11) Build a new release.
generating documentation packaging
more recipes from blogs - i.e. adding foreign key index option for disabling automatic behaviour, adding "comment" support