Chapter 8. Schema Representation

The external database schema representation language used in Theon is XML-Schema, with some additional annotation elements to cover aspects of a database schema which cannot naturally be expressed in XML-Schema. This is not a typical approach. XML-Schema would normally be used to define an XML Document Type which would then be used to represent the schema (to allow an instantiation of that schema in XML). However, XML-Schema is such a close parallel to the core ER model used in a relational database that in this architecture it was appropriate to use it as-is (at least as the foundation).

Each "significant" element of a database schema is represented in Theon as an individual file formatted as an XML-Schema document and referred to as an XSD file. The XSD files are all held together in one structured directory hierarchy called schemat in the ModelLibrary. The XSD files and their containing directory hierarchy are together referred to as a SchemaTree. An XSD file can include (by XML-Schema inclusion syntax) other XSD files from anywhere else in its own SchemaTree. A single XML-Schema document (XSD file) is then aggregated automatically by schematic traversal of the Schema Tree starting from any one of the individual XSD files. This single XSD file is then used for all further processing of the XML-Schema (principally transformations by XSLT).

XML documents are not particularly suited to human editing. So all work on the XML-Schema is normally carried out using a PostgreSQL physical database called XSDDB. This holds a far more easily manipulatable snapshot of a SchemaTree. Although using XSDDB is not required, direct editing of the XSD files in a SchemaTree for any change even slightly more than trivial is really very hard to get right and not recommended.

The XSDDB database is also used as a hub to collate the output of automated database schema generation processes (made using the SchemaGactory). These are much more naturally adapted to running against a database using SQL than to directly parsing and generating raw XSD files in a SchemaTree.

Hence, the workflow for managing a SchemaTree representing a database schema is simply: load it into a personal instance of XSDDB; modify the content of XSDDB; re-generate a new version of the SchemaTree based on that modified content. In this way the use of the XSDDB database to maintain content (which can be done in a variety of ways described later) means that it can be effectively treated as a very customised (to the architecture of a model in Theon) XML-Schema editor.

8.1. XML-Schema Tree of XSD Files - Directory Structure

Whether the SchemaTree is held in a version control system or not the directory structure remains the same. From the starting path, which generally when working with Theon will be the current working directory, the structure is as below.

    schemat/
      cluster/
        *.xsd
      relation/
        *.xsd
      process/
        *.xsd
      role/
        *.xsd
      stream/
        *.xsd
      desktop/
        *.xsd
      panel/
        *.xsd
      live/
        *.xsd
      undefined/
        *.xsd
      MODEL.xsd

The SchemaTree is entirely held in a directory called schemat. This contains a number of sub-directories and a single file. Each sub-directory is only present if there is at least one instance of that element in the model represented. The MODEL.xsd file will actually be a file with the name of the model the SchemaTree represents (using the value of the tag attribute for that model).

In the root of the SchemaTree is the MODEL.xsd file containing the base description of the physical database being represented. This includes most (normally all) of the other XSD files in other parts of the tree. Any XSD files not included by MODEL.xsd (directly or indirectly by subsidiary inclusion) will not be part of any transformations on the SchemaTree. All subsequent processing is usually based on producing an XML-Schema document by schematic traversal of MODEL.xsd and of whatever dependencies it has. However, partial (and viable) processing can generally be done starting schematic traversal with any of the XSD files described below, thus creating a partial output based only on that fragment of the SchemaTree. For example, processing a cluster/CLUSTER.xsd file would produce an XML-Schema document for that cluster only (but including any dependent relations, processes, desktops etc). Consequently a functional fragment of a physical database schema can be produced. This only works in practice if the model has been structured in such a way that it can be modularised and all of the dependencies of each module are explicitly defined. However, being able to initiate processing at any point in the SchemaTree is useful for testing and debugging.

The cluster directory contains an XSD file per cluster (a grouping of relations and processes serving a unified functionality or common class). Each cluster XSD file directly includes the relevant XSD files from relations and processes. The cluster XSD file will be named using the value of the tag attribute for that cluster. A cluster used in Theon should not be confused with the usage of cluster in PostgreSQL itself (a distinct installation of multiple databases).

The relation directory contains an XSD file per relation (any entity that is not a function, for example a table, view, type or sequence). Each entity XSD file will include any required XSD files in relations and processes. For example the XSD file containing a view definition will have an include to a relations XSD file for each table included in the view query. The entity XSD file will be named using the value of the realised name attribute for that entity.

The process directory contains an XSD file per function (any entity that is a function type). The process XSD file will be named using the value of the realised name attribute for that function. The name will be suffixed with a hashtag if the corresponding entity has been defined with overload. This is needed as often a function name is not itself unique - it is the name plus the distinct types of each argument that make up a unique identifier. So, in Theon, the hashtag is just an encoding of the argument types to make a simple unique process name.

The role directory contains an XSD file per role. The role XSD file will be named using the value of the realised name attribute for that role. However, the name of the actual corresponding role created in PostgreSQL will always be prefixed by MODEL_ (where MODEL is the value of the tag attribute for the model being represented by the SchemaTree). This is because, although roles are defined as being local to each model in Theon, in PostgreSQL they are actually global to an entire PostgreSQL installation (or cluster), and are not specific to any one physical database. So, to guarantee that the name is unique and to facilitate role management - making the assumption that all databases in the PostgreSQL installation or cluster are managed with Theon, or if not, that they follow the convention - the model name prefix is applied.

The undefined directory contains an XSD file per entity that contains content that is not able to be fully represented within Theon. In principle it can contain anything, including the SQL DDL for an entire model where necessary. The XSD file will be named using the value of the entity name attribute for that entity. The content is included as-is in the realisation of the model.

The live directory contains an XSD file used for generation of a LiveProfile. It is not strictly something which needs to be saved as part of the SchemaTree.

The stream directory contains an XSD file per stream. The stream XSD file will be named using the value of the tag attribute for that stream. A stream is not itself directly realised as an element in the physical database structure. Instead, a stream definition results in changes to relations, and adds relations and processes, which are then realised in the physical database. Some aspects of a stream are never actually represented in the SchemaTree itself, but produced purely as a result of subsequent transformations. Other aspects of streams do consequently produce changes in the SchemaTree, as there is a need for them to be directly referred to by other elements (this is achieved via processes within XSDDB itself). In essence, the stream XSD file is a high level description of facets of relations and processes represented in other parts of the SchemaTree, and aspects of the physical database structure which are created purely by transformation, with no other direct representation in the SchemaTree.

All of the sub-directories described above contain parts of the SchemaTree which ultimately define the actual physical database structure (its schema) represented by the model (clusters do this indirectly by acting as a grouping agent). The remaining sub-directories below contain parts of the SchemaTree that have no mapping in the physical database structure. Instead they are used to represent elements used by TheonUI.

The desktop directory contains an XSD file per desktop available to TheonUI for the model. Each desktop XSD file directly includes the relevant XSD files from panels. The desktop XSD file will be named using the value of the tag attribute for that desktop. A desktop is a structured collection of panels (generally created specifically for that desktop, but they need not be) and the live template to use for the panels on that desktop (for example preset query conditions).

The panel directory contains an XSD file for each panel that can be used on a desktop. The panel XSD file will be named using the value of the tag attribute for that panel. A panel is an embedded representation of multiple relations (tables or views) providing a simplified presentation of complex structure for user maintenance of the data.

The internal structure of each kind of XSD file described above is outside the scope of this documentation. This is because it is primarily an implementation detail, and there is no need to know the details to use Theon in a private installation. In a shared version controlled installation where the possibility of conflicting changes on commit arises, some knowledge of each XSD file is useful, if only to identify what in XSDDB needs to be altered to resolve the conflict.

8.2. XSDDB Structure

This is an overview of the structure of the Theon XSDDB database itself. It is not strictly necessary to know the full details of this in order to manage a physical database as a model in Theon. However, it is useful as a terminology overview (in particular when using the TheonUI desktops to maintain model content).

The XSDDB database is used as a more effective way of maintaining the master representation of a model held in the SchemaTree XSD files. This is necessary, as human editing of the raw XSD XML files is practically impossible, or certainly unpleasant. It is also necessary because each distinct process for the automated creation or maintenance of model elements works much more easily (and consistently) against a database using SQL than it would with each having to parse and/or generate the XSD XML files itself. XSDDB provides a more powerful (but generic) management interface for the model.

The XSDDB database can represent simultaneously the model for any number of physical databases. However, it is only ever a snapshot of the SchemaTree XSD files for each model, and it cannot be updated from these files again once the snapshot has been taken. This is because it would require the database (or some application built around it) to incorporate merge logic handling (when the SchemaTree XSD files have been changed independently from the time of the snapshot). That function is better handled by whatever version control system holds the actual SchemaTree XSD files themselves.

Each user would usually have their own independent copy of an XSDDB database which could be created and destroyed as necessary and would hold any number of snapshots of SchemaTree XSD files. This does not mean that for each user the XSDDB database cannot be persistent - just that prior to any reload of Schema Tree XSD files, the current data in the database representing those files must first be exported into the working copy, and merged/committed back. If this is not done, any changes made within the database representing content in those SchemaTree XSD files could be lost on reload. The XSDDB database is effectively a different container for the "working copy" of the SchemaTree XSD files, but the filesystem "working copy" of these files is always the master.

This is all less relevant in the case of a user with a private installation of Theon not doing version control (or anyway not against a central shared repository). In this case it is more likely that the XSDDB will be persistent and used as the effective master of the content, and an export of the Schema Tree XSD files only done as and when changes are made that need to be deployed.

8.2.1. Architecture

A diagram of the XSDDB database is shown below and explained within this section. The diagram shows each table grouped by first class element. A first class element is a table directly belonging to the top level Model table without any further mandatory dependency beyond itself. Pure mapping tables are not shown as they are just structural artefacts necessary to represent many to many relationships - instead these are implicitly shown by double arrow loops (either within a table if self-referential or across tables if not). Single arrow loops within a table represent a self-referential relationship. Single arrow lines between tables represent a parent-child relationship in the direction of the arrow.

Figure 8.1. XSDDB Architecture

XSDDB_Architecture.svg

The XSDDB database is virtually divided into two clusters. The Dictionary cluster contains the representation of a physical database’s structures. It also contains the representation of the TheonCoupler configuration, since that is ultimately realised as structures in a physical database. The Reflection cluster contains the representation of the TheonUI components that are associated with a particular physical database.

The Model (MOD) table exists in both clusters. It holds the core facets of each physical database and its schema. A model is ultimately realised as a single PostgreSQL physical database. Although there may be multiple instantiations of that database, each would be under a different installation.

A model is exported from the XSDDB as a single XSD file (with inclusion lines for every first class element that is contained in the model) in the top level directory of the model’s SchemaTree.

The Cluster (CLU) table also exists in both clusters. It serves to group together structures where there is a commonality. This might be a commonality of process, such as: all these tables, functions and desktops together (and with no other dependencies) form a particular business process. Or a cluster might be a commonality of type, such as: all these tables have a common structure as they are used in the same way across multiple processes. Structures can appear in any number of different clusters depending on each cluster’s particular grouping arrangement. Clusters are never actually realised as an object within a physical database, but are explicitly used in TheonUI to structure table selection and linkage and for modular component packaging, documentation and release management. The closest parallel to cluster in a physical database would however be "scheme". Clusters are entirely optional within any model. Objects that do not belong to any cluster are always still included in the realisation or the exported XML-Schema of a model. A scheme is a special kind of cluster that contains other clusters (as well as optionally other structures). A cluster does not need to belong to a module but if it does can only belong to one. A scheme is currently used solely by TheonUI to structure desktop selection.

A cluster (or module) can contain most of the first class elements (those with a direct belonging to Model and that can exist without any dependency beyond that) below, so: Entity, Role, Stream and Desktop but not Panel (included through Desktop) or any other elements (those that do not have a direct belonging to Model, or do but have additional dependencies).

A cluster is exported from the XSDDB as a single XSD file (with inclusion lines for every first class element that is contained in the cluster) in the cluster directory of the model’s SchemaTree.

8.2.2. Dictionary

The following tables are primarily associated with the Dictionary cluster.

The Entity (ENT) table is a "composite" table in that it and a number of other tables together represent entities in the model. An entity can be any (supported) PostgreSQL "relation". At present an entity can be realised in a physical database for any one of the following relations: Table, Type, View, Index, Enumerate, Sequence, Function (and some sub types of function). An entity can also be a "container" of raw SQL for unsupported relation types which cannot be directly represented.

An entity is exported from the XSDDB as a single XSD file (with inclusion lines for what are dependencies of the entity) in either the relation or process directory of the model’s SchemaTree.

Tables belonging to an Entity (those that help to define it) are: Attribute (ATR) for entity attributes (such as table columns, function parameters and type options); Domain (DOM) for attribute value sets; Key (KEY) for primary and unique keys; AKMap (AKM) for correlating key attributes; Relationship (REL) for inter-entity constraints (foreign keys); RAKMap for correlating constraint attributes; Event (EVE) for change triggered processes.

The Role (ROL) table is a "composite" table in that it and a number of other tables represent permissions in the model. A role is realised in a physical database as a PostgreSQL "role". However, roles in PostgreSQL are installation/cluster wide and not associated with specific physical databases or unique per-database as defined here. The schema generation part of Theon prefixes role names with the model name (if not already done manually) on realisation. This makes them unique across the installation and manageable as pseudo per-database roles. However, this is only fully effective with careful administration of all PostgreSQL roles in the installation or if all user databases in the installation are managed via Theon.

A role is exported from the XSDDB as a single XSD file in the role directory of the model’s SchemaTree.

Tables belonging to a Role (those that help to define it) are: Capability (CAP) for entity and entity attribute permissions associated with the role; CAMap (AKM) for correlating capability attributes.

The Stream (STR) table is a "composite" table in that it and a number of other tables represent streams (data sources and synchronisation processes) used in the model. A stream is itself not directly realised within a physical database. However, the defined stream functionality is automatically expanded within XSDDB into additional entity attributes (for metadata) and entities (for synchronisation processes) that are then ultimately realised as PostgreSQL relations in a physical database.

A stream is exported from the XSDDB as a single XSD file in the stream directory of the model’s SchemaTree. The actual implementation of a stream is achieved by export of the additional entity attributes and entities as XSD files in the relation and process directories and also by the derivation steps that are part of TheonCoupler.

Tables belonging to a Stream (those that help to define it) are: Couple (COU) defining each "source to target" synchronisation process; Bond (BON) defining each type of grouping that is tied by the synchronisation process; Pair (PAI) defining the columns in the source and target (and lookup) that together make a grouping.

8.2.3. Reflection

The following tables are only present in the Reflection cluster. The Reflection cluster also includes some tables from the Dictionary cluster as reference to actual physical database structure: Entity, Attribute and Relationship.

The Desktop (DES) table is a "composite" table in that it and a number of other tables represent desktops used in the model (under TheonUI). These are not realised in a physical database, but instead define metadata shared by the client and server applications of TheonUI.

A desktop is exported from the XSDDB as a single XSD file in the desktop directory of the model’s SchemaTree.

Tables belonging to a Desktop (those that help define it) are: Link (LIN) holding each of the panels used on a desktop.

The Panel (PAN) table is a "composite" table in that it and a number of other tables represent panels used in the model (on desktops under TheonUI). These are not realised in a physical database, but instead define metadata shared by the client and server applications of TheonUI.

A panel is exported from the XSDDB as a single XSD file in the panel directory of the model’s SchemaTree.

Tables belonging to a Panel (those that help define it) are: Connector (CON) for the entities (tables, views, functions and their relationships) encapsulated by the panel; Field (FIE) for the entity attributes (columns and returns) used on the panel; Presenter (PRE) for the presentation types (such as grids or layers) used to group fields on the panel.

The following tables also belong to a Panel (and help define it) but are not actually included in the exported XSD file for a panel: Lock (LOC) for default attribute values to use on record creation; Sequence (SEQ) for default attribute ordering in queries; Template (TEM) for default attribute filtering in queries. Instead these are aggregated and included in the exported XSD file for each desktop using the panel. This is because the content can be desktop dependent.