Chapter 33. Creating a new ModelLibrary from scratch

This recipe is for creating a new model from scratch in Theon, including using the coupler and factory. To use a real world example this will be for the Matlab TAH Licensing Anaysis Database, otherwise known as "mattah" (as in "wot’s the").

This recipe can be run on any machine that has the necessary packages installed and suitable configuration of PostgreSQL (see configuration recipe).

In addition this recipe has some support files, locally available. Obtain the support files archive (mattah2theon.tgz) and extract somewhere suitable and set the path in $MATLAB2THEON to point at them. Then do an initial database load from the dump file.

cd /tmp
tar zxvf -/matlab2theon.tgz
export MATLAB2THEON=/tmp/matlab2theon
createdb matlab
psql matlab < ${MATLAB2THEON}/matlab.sql

(1) Create our new ModelLibary directory and change into it. We will start from the existing prototype database called matlab (available to load from the recipe example support files, see above) on localhost so we launch with a LiveProfile for that.

ttkm mattah launch at /tmp using database=matlab host=/disk/scratch
cd /tmp/mattah

(2) Create a local management database (destroy prior instance, don’t do this if you have un-exported content). Import our current matlab database.

ttkm create aok
ttkm import

(3) Reset our LiveProfile for what will be the new Theon managed database name.

ttkm launch aok using database=mattah host=/disk/scratch

(4) Update the model default database name for consistency, we use the UI to do this.

ttkm self server ui

Point a locally running browser at localhost:8000. Choose the Dictionary desktop and on the first panel select the mattah model (it will be the only model if the management database was created anew in step 1). Change the Name[Realised] value from matlab to mattah. Then Control-C the server process.

(5) Now export the model and do a test installation.

ttkm export
ttkm derive ddl
ttkm install
psql mattah -c "\d"

If all ok then that is the database now held in Theon as a ModelLibrary. We can for re-assurance though do a diff between the original and this by dumping each and comparing (although note that they are rather unlikely to be absolutely identical, but would be looking here for any important functional differences).

pg_dump95 -s matlab > /tmp/m.old
pg_dump95 -s mattah > /tmp/m.new
diff /tmp/m.old /tmp/m.new

In this case only indexes have not been imported. FIX ME. Then delete the original database.

dropdb matlab

(6) We now want to setup a factory to build some additional views and functions for the model.

mkdir -p factory/generator/mattah
mkdir factory/templates

(7) Create a generator specific to this model to contain some defaults for view templates. This can be copied from the recipe support files.

cp ${MATLAB2THEON}/factory/generator/mattah/view.grg factory/generator/mattah

Review the content.

$ cat factory/generator/mattah/view.grg
%%include "relation/view.grg"
%%equate relation_ViewName BASENAME(_eq_file, ".fat")
%%equate relation_ViewValueBlock "SQL_View"
%%equate relation_ForgeViewRequiresBlock "SQL_ViewRequires"

The first line includes the base class in the Factory to build a View. The second line defaults the name of the view to be the same as the name of the defining template file. The next two lines define the block names that will be used to define the view itself as well as any pre-requisites.

(8) Create a file for each individual view. These will each be named factory/templates/VIEWNAME.fat. Copy each from the recipe support files.

cp ${MATLAB2THEON}/factory/templates/concurrency_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/costs_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/maximums_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/report.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/usage_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/usage_data.fat factory/templates

Review the content of each, the structure of which will be as below.

%%include "mattah/view.grg"

%%block SQL_ViewRequires
  VALUES
    ( 'VIEWNAME', NULL ),
    ( 'VIEWNAME', NULL )
%%end

%%block SQL_View
  <<<VIEW SQL DEFINITION>>>
%%end

Note that the template files do not define any dependencies on tables (these will be imported below), but dependencies on views have been added to ensure the correct creation order.

(9) Process the Factory to create all the views in the management database.

ttkm gather

(10) Export, re-derive and re-install the mattah database. Then import again (this time from the mattah database as it is the default in the LiveProfile) which will set up each View dependencies and columns, then export again to update the SchemaTree.

ttkm export
ttkm derive ddl
ttkm install aok
ttkm import
ttkm export

Check the views have been defined.

psql mattah -c "\d"

(9) Next a quick test that the default TheonUI Browser interface works. First derive the UI meta data. Then start up the server.

THEON_MODEL_XSLT_PATH=/usr/share/theon/ui/xslt ttkm derive ui
ttkm server ui

Navigate a locally running browser to localhost:8000 and choose the Browser desktop, All Clusters and then test opening each of the tables under the Choose Table menu (although obviously there will be no data in any of them yet). Note that the views will not be listed, they are not directly accessable via TheonUI. Then Control-C the server process.

(10) Set up a basic Stream to load the products table data. This Stream does not have any couples as the data is loaded straight into the target table an no further work is required. Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the Stream.

Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = Products
  Origin = PROD
  ? Feed Entity = products
  Source = File
  Format = CSV
  Name[Info] = Products

Control-C the server process. Then build the TheonCoupler configuration.

ttkm export
ttkm derive coupler

Do a test to run the stream agent and load the data. A data file in the correct format is available in the recipe support files.

ttkm stream agent products refresh from ${MATLAB2THEON}/products.csv

Check the data loaded.

psql mattah -c "SELECT * FROM products;"

(11) Set up a Stream with a Couple to load the licenses table data. This Stream will use a FDW (Foreign Data Wrapper) which will be created live and then imported back into the management database. The data file that is the source for the FDW is available in the recipe support files.

psql mattah -c "CREATE EXTENSION file_fdw;"
psql mattah <<EOF
CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE licenses_data (
  logentry INTEGER,
  logdate DATE,
  logtime TIME WITHOUT TIME ZONE,
  state TEXT,
  product TEXT,
  uun TEXT,
  machine TEXT,
  infinnac INTEGER
) SERVER file_server
OPTIONS (format 'csv', filename '${MATLAB2THEON}/licenses.csv');
EOF

Import the added DDL back into the management database. This will only import the extension defintion however as FDW tables are not natively supported in Theon.

ttkm import

Add the necessary DDL for the FDW manually into the management database (although this could also be done by a Factory Template). Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the FDW.

Open the "Dictionary" desktop.
On the first panel ("Model") ensure the right model is shown and then
add a new row in the "Entities" box:
  Name[Realised] = licenses_data
  Type = Container
  Content = SQL
Open the "Entity" panel.
Query for the "licenses_data" record.
Copy the SQL above (between the EOF delimiters, so not the extension) into the
"Value" field (remember to manually expand the ${MATLAB2THEON} variable when
doing this as it won't be expanded within PostgreSQL).
Add the following rows in the "Attributes" box:
  Position      Name[Realised]  Type    Content
  1             logentry        Column  Number/Integer
  2             logdate         Column  Date
  3             logtime         Column  Time
  4             state           Column  Text
  5             product         Column  Text
  6             uun             Column  Text
  7             machine         Column  Text
  8             infinnac        Column  Number/Integer

Control-C the server process.

Export, re-derive and re-install to check the FDW has been correctly added.

ttkm export
ttkm derive ddl
ttkm install aok
psql mattah -c "SELECT * FROM licenses_data LIMIT 10;"

Create the Stream and a Couple using the FDW source and with a literal to add an extra column’s content. Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the FDW.

Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = Licenses
  Origin = LICE
  Name[Info] = Licenses
Open the "Stream" panel and ensure "Licenses" is shown and then add a new row
in the "Couples" box:
  Position = 1
  Source Entity = licenses_data
  Target Entity = licenses
  Name[Info] = Licenses
Open the "Couple" panel and enter and save the following.
  Source Origin Value = 'LIC'
  Then add two new rows in the "Bonds" box:
    Position    Type
    1           Handle
    2           Column
Open the "Bond" panel and add rows in the "Pairs" box as below:
  For "Handle":
    Position = 1
    Source Column = logentry
    Target Column = logentry
  For "Column":
    Position    Source Column   Target Column
    1           logdate         logdate
    2           logtime         logtime
    3           state           state
    4           product         product
    5           uun             uun
    6           machine         machine
    7           infinnac        infinnac
  Then add one more row for "Column":
    Position = 8
    Source Value = (@logdate || ' ' || @logtime)::TIMESTAMP
    Target Column = logts

Control-C the server process.

Rebuild the database to accommodate changes made for TheonCoupler.

ttkm export
ttkm derive ddl
ttkm install aok
ttkm derive coupler

Build the TheonCoupler configuration.

ttkm derive coupler

Do a test to run the stream agent and load the data. The licenses data is a large file creating nearly one million records so this process will take approximately 10 minutes. Note that an FDW was used here as an exemplar and is not the most optimal way to load and process this data.

ttkm stream agent licenses

Check the data loaded.

psql mattah -c "SELECT * FROM licenses LIMIT 10;"

Note that when using a Container as a source for a Couple like this it must be (produce) a table-like or view-like object to which permissions can be granted and to which a SELECT can be applied.

(12) Set up a Stream to load the usage table data. This Stream uses a View, which will be defined by Factory Template first, and one Couple to process.

Create the Factory Template file for the file, this can be copied from the recipe support files.

cp ${MATLAB2THEON}/factory/templates/usage_data.fat factory/templates

Process the Factory content, then rebuild the database with the new view.

ttkm gather
ttkm export
ttkm derive ddl
ttkm install aok

Do an import to pick up the new View’s columns so they can be used in the definiton of the Couple.

ttkm import

Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the Stream and Couple.

Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = Usage
  Origin = USAG
  Name[Info] = Usage
Open the "Stream" panel and ensure "Usage" is shown (by query or row selection)
and then add a new row in the "Couples" box:
  Position = 1
  Source Entity = usage_data
  Target Entity = usage
  Name[Info] = Usage
Open the "Couple" panel and enter and save the following.
  Source Origin Value = 'USE'
  Then add two new rows in the "Bonds" box:
    Position    Type
    1           Handle
    2           Column
Open the "Bond" panel and add rows in the "Pairs" box as below:
  For "Handle":
    Position = 1
    Source Column = uun
    Target Column = uun
  For "Column":
    Position = 1
    Source Column = n
    Target Column = n

Control-C the server process.

FIXME - the Source Origin Value and Origin are mandatory so probably should have a default.

Now export the changes, rebuild the DDL and re-install the database in order to add the TheonCoupler support functions.

ttkm export
ttkm derive ddl
ttkm install aok

Rebuild the TheonCoupler configuration.

ttkm derive coupler

Test the feed works - in principle only as there is no data for it to process (unless we first rerun the licenses stream).

ttkm stream agent usage

(13) Set up a Stream to load the category table data. This Stream uses a live dblink to the central infdb database to get the base data and a view to wrap the data. While the two could be amalgamated this will result in authentication issues - the dblink is run with the authentication of the PostgreSQL server, which if using the supplied configuration will be the local user which should have access to infdb. The Couple is run as a specific role without (by default) the necessary remote access.

Create the Factory Template files for the dblink function and wrapper view, these can be copied from the recipe support files.

cp ${MATLAB2THEON}/factory/templates/people_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/people_data_type.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/people.fat factory/templates

Review the content. The first file defines the function, the second the type returned by the dblink call and the third the wrapper view.

Process the Factory content, then rebuild the database with the new structures.

ttkm gather
ttkm export
ttkm derive ddl
ttkm install aok

Test the dblink returns data from infdb via the view wrapper.

psql mattah -c "SELECT * FROM people;"

Do an import to pick up the new View’s columns so they can be used in the definiton of the Couple.

ttkm import

Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the Stream and Couple.

Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = Categories
  Origin = CATE
  Name[Info] = Categories
Open the "Stream" panel and ensure "Categories" is shown (by query or row selection)
and then add a new row in the "Couples" box:
  Position = 1
  Source Entity = people
  Target Entity = category
  Name[Info] = Category
Open the "Couple" panel and enter and save the following.
  Source Origin Value = 'CAT'
  Then add two new rows in the "Bonds" box:
    Position    Type
    1           Handle
    2           Column
Open the "Bond" panel and add rows in the "Pairs" box as below:
  For "Handle":
    Position = 1
    Source Column = uun
    Target Column = uun
  For "Column":
    Position    Source Column   Target Column
    1           status          status
    2           category        category
    3           fundingcat      fundingcat

Control-C the server process.

Now export the changes, rebuild the DDL and re-install the database in order to add the TheonCoupler support functions.

ttkm export
ttkm derive ddl
ttkm install aok

Rebuild the TheonCoupler configuration.

ttkm derive coupler

Test the feed works.

ttkm stream agent categories

Check loaded content.

psql mattah -c "SELECT * FROM category LIMIT 10;"

(14) The final Stream to set up is used for internal processing rather than for loading data. It uses a paramaterized function as a primary source which can be called on demand through the stream in order to analyse specific ranges of license data. This populates the aligned table with corresponding licenses data. One Couple does this. Three further Couples do more processing in a pipeline from the aligned table into the concurrency table, then into the maximums table then into the costs table.

Create the Factory Template files for the base function and return type, these can be copied from the recipe support files.

cp ${MATLAB2THEON}/factory/templates/aligned_data.fat factory/templates
cp ${MATLAB2THEON}/factory/templates/aligned_data_type.fat factory/templates

Review the content. The first file defines the function, the second the type.

Process the Factory content, then rebuild the database with the new structures.

ttkm gather
ttkm export
ttkm derive ddl
ttkm install aok

Do an import to pick up the new Function’s paramaters and return type columns so they can be used in the definiton of the Couple.

ttkm import

Start up a management database server.

ttkm self server ui

Navigate a locally running browser to localhost:8000 and follow the following steps to create the Stream and Couples.

Open the "Dictionary" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
open the "Entity" panel and find the "aligned_data" entity. The return type
columns must be manually added so they can be referenced by the Couple as they
are not expanded automatically (FIXME). Add the following rows in the
"Attributes" box:
  Position      Name[Realised]  Type    Content
  1             checkout_le     Column  Number/Integer
  2             checkout_ts     Column  Date/Time
  3             checkin_ts      Column  Date/Time
  4             uun             Column  Text
  5             category        Column  Text
  6             machine         Column  Text
  7             product         Column  Text
Open the "Streaming" desktop.
On the first panel ("Choose Model") ensure the right model is shown and then
add a new row in the "Streams" box:
  Stream-Label = Aligned
  Origin = ALIN
  Name[Info] = Aligned
Open the "Stream" panel and ensure "Aligned" is shown (by query or row selection)
and then add four new rows in the "Couples" box:
  Position      Source Entity   Target Entity   Name[Info]
  1             aligned_data    aligned         Aligned
  2             concurrency_data        concurrency     Concurrency
  3             maximums_data   maximums        Maximums
  4             costs_data      costs           Costs
Open the "Couple" panel and enter and save the following same data for each of
the four couples above.
  Source Origin Value = 'ALI'
  Then add two new rows in the "Bonds" box:
    Position    Type
    1           Handle
    2           Column
Open the "Bond" panel and add rows in the "Pairs" box for the "Aligned" couple
as below:
  For "Handle":
    Position = 1
    Source Column = checkout_le
    Target Column = checkout_le
  For "Column":
    Position    Source Column   Target Column
    1           checkout_ts     checkout_ts
    2           checkin_ts      checkin_ts
    3           uun             uun
    4           category        category
    5           machine         machine
    6           product         product
Open the "Bond" panel and add rows in the "Pairs" box for the "Concurrency" couple
as below:
  For "Handle":
    Position    Source Column   Target Column
    1           snap_ts         snap_ts
    2           category        category
    3           product         product
  For "Column":
    Position    Source Column   Target Column
    1           num_concurrent  num_concurrent
Open the "Bond" panel and add rows in the "Pairs" box for the "Maximums" couple
as below:
  For "Handle":
    Position    Source Column   Target Column
    1           category        category
    2           product         product
  For "Column":
    Position    Source Column   Target Column
    1           maxcon          maxcon
Open the "Bond" panel and add rows in the "Pairs" box for the "Costs" couple
as below:
  For "Handle":
    Position    Source Column   Target Column
    1           pgrouping       grouping
    2           product         product
  For "Column":
    Position    Source Column   Target Column
    1           current_prov    current_prov
    2           required_prov   required_prov
    3           purchase_cost   purchase_cost
    4           annual_cost     annual_cost
    5           total_3yr_cost  total_3yr_cost

Control-C the server process.

Now export the changes, rebuild the DDL and re-install the database in order to add the TheonCoupler support functions.

ttkm export
ttkm derive ddl
ttkm install aok

Rebuild the TheonCoupler configuration.

ttkm derive coupler

Do a notional test of the Stream, although note at this stage there is no data.

ttkm stream agent aligned couple aligned passing lower as \"10\", upper as \"100\"
ttkm stream agent aligned couple concurrency
ttkm stream agent aligned couple maximums
ttkm stream agent aligned couple costs

(15) Reload all data and process internal streams. Note that the aligned and particularly concurrency couples can take a long time (many minutes) depending on the values of the lower and upper parameters specified.

ttkm stream agent products refresh from ${MATLAB2THEON}/products.csv
ttkm stream agent categories
ttkm stream agent licenses
ttkm stream agent usage
ttkm stream agent aligned couple aligned passing lower as \"10\", upper as \"1000\"
ttkm stream agent aligned couple concurrency
ttkm stream agent aligned couple maximums
ttkm stream agent aligned couple costs

Check the final output for the analysis.

psql mattah -c "SELECT * FROM report;"

(16) Produce documentation for the ModelLibrary. This will be rather sparse as not a lot of comments were added in the entries above.

ttkm derive ddldoc

Navigate a locally running browser to file:///tmp/mattah/derived/model/html/index.html to review the full documentation.

(17) Commit and deploy the ModelLibrary. Commit the SchemaTree, create a release and build a package so it can be deployed and run by anyone on their own machine or to be installed on a central server.

git add schemat
git commit
ttkm release
ttkm package
ls -l package
total 132
-rw-r--r-- 1 timc people 21512 Mar 31 13:53 mattah-1.1-1.src.rpm
-rw-r--r-- 1 timc people 35360 Mar 31 13:53 mattah-theon-library-1.1-1.noarch.rpm
-rw-r--r-- 1 timc people 52124 Mar 31 13:53 mattah-theon-library-devel-1.1-1.noarch.rpm
-rw-r--r-- 1 timc people 18480 Mar 31 13:53 mattah-theon-library-ui-1.1-1.noarch.rpm
  1. do a test install of rpm here, then installing/running from system installation …

Use UI to look at tables, to look at licenses table (massive) need to add:

"limit_rows" : 250,
"limit_rows_conditionally" : True,

to derived/ui/default/ui.conf and then query conditionally.