Chapter 6. Introduction

A database schema defines the entire database structure - the tables and their columns, their constraints, the views, functions, and so on. The schema is represented in a Data Definition Language (DDL), a set of SQL statements which create from scratch a database with the defined structure (but no content). Effectively this is the output of pg_dump -s in PostgreSQL.

Although the "system catalog" in PostgreSQL provides schema metadata, it is not intended to meet the representational requirements of a high level data modelling language. Such a language might include the automated mapping and replication of common high level abstract structures and behaviours onto multiple lower level database objects and support using version control for schema changes. However, an external representation of the schema which does do this will not necessarily provide the rapid development cycle that can be achievable through the use of raw DDL SQL statements in PostgreSQL. The external schema representation and toolkit provided in Theon aims to achieve a happy medium of both. The approach taken is to use a "feedback loop". It starts with a synchronisation of core aspects of the raw system catalog data into the external schema representation. Here, it can be supplemented and modified at a "higher" level and then re-deployed via automatically generated "changeset" DDL SQL directly back into the live database, updating structures to conform with the new external schema representation. The result is an effective and fast production cycle for schema change management (develop, test and deploy) capable of supporting immediate live change where necessary, as well as multiple distinct (and potentially divergent) instantiations and deployments of core business data models.

Theon is used to create and maintain a database schema. Theon does not however use raw DDL SQL statements directly, but represents a database schema at a higher level in data structures, similar in many ways to the PostgreSQL system catalog itself.

The Theon external schema representation can in one sense be treated as a snapshot copy of the PostgreSQL system catalog, but in a different format which can be annotated and extended. By using an XML format, standard transformative technology can be used generically for all processes that need to use the schema. These transformation processes are then isolated from any internal system catalog changes. The external schema representation can be easily stored in a version control system. By doing so, changes in the schema can then be closely tied to any dependent changes in external applications that are stored in the same version control system. Systematic and consistent releases of the schema and applications can thereby be made at the same time.

With different versions of a schema simultaneously accessible (through a version control system) it is possible to construct an SQL DDL changeset that automatically transitions (upgrades) the structure of a physical database from its current version to the next appropriate version. This facilitates a nice "develop, test then deploy" production cycle. Different versions of databases can be out in the wild, and each can be cleanly updated at different times as appropriate to its user base. Multiple developers can be working simultaneously on the schema, and each of their changes can be independently tested in branches then merged and committed back into the main development trunk. However, developers do not lose the immediacy of direct database structure manipulation for development work (for instance being able to do "ALTER TABLE …" with instant effect). The architecture of Theon is designed to support both.

In addition to allowing the use of version control, using an external schema representation means that higher level "constructs" which result in low level database structure can easily be incorporated. A case in point would be TheonCoupler which provides a generic synchronisation architecture for bringing foreign data into a database. Here a high level description of the external data streams, source, target and mapping rules results in the automated addition of table columns, views and functions for actually implementing each distinct flow and process. Similarly templating technologies are used to automatically create schema objects in a consistent and maintainable way from any number of differently formatted sources. The external schema representation includes structures which, while not actually part of the database schema itself, are heavily interlinked to (dependent on) it: external applications such as user interfaces and reporting.

Often database schemas are rigidly defined as part of an original application requirement. Some local production service environments, however, have a much more constant shift in requirements that often requires a very dynamic response to schema changes. These changes might be to fix a bug, make a small enhancement or add entirely new structures (as well as removing now obsolete structures). This needs to be done in a controlled way on a "live" (in-service production) database. It also needs to be done in a careful way due to the number of related dependencies outside of the pure database schema definition (such as reports, downstream consumers of data and other applications) all of which almost certainly have schema specific aspects encoded within them. Finally, it must be possible for consistent releases to be made that can be deployed and seamlessly applied in other sites using the same database - the schema must be updated along with the applications, or metadata consumed by the applications. Other sites might even make devolved changes to the schema they need to manage themselves or which need to be regularly incorporated back into the master of a common database. This is the environment, approach and benefit of the architecture used in Theon.