Using Metadata to validate data

Since metadata describes the format of the data and where to find it on disk, it is used by the NITE software to validate the data as it is loaded and edited. This sort of direct validation is useful, but we also provide schema validation of data using a schema derived automatically from the metadata (via a stylesheet).

Assuming you have already downloaded and installed NOM, you already have the schema-generating stylesheet (it's in the lib directory). Armed with this and a stylesheet processor (xalan is also in the NOM distribution), you can run this command on your metadata file:

java org.apache.xalan.xslt.Process -in <your-metadata> -xsl generate-schema.xsl -out extension.xsd

If you have a schema validator (I use xsv) you are now ready to validate some data files. Try putting these declarations:

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="extension.xsd"
			

in the root element of your data file and then execute:

xsv <your-file>

One of the major reasons behind this approach to schema validation is that we can validate data that is either a single file "as-serialized" by NITE, or files that have been transformed to replace their nite:child elements with the pointed-to elements recursively, and also replacing pointers with their actual elements. This is useful for validating the types of elements that can be children of a specific element and pointed to by that element. In this way an entire corpus could be schema validated. You have a stylesheet that does this transformation in the lib directory of your NOM distribution.

If this all seems rather involved, and your data already loads into the NOM, the program PrepareSchemaValidation.java will make a new directory for you which is fully ready for schema validation.

Validation limitations

  • all stream elements must be named nite:root;

  • all ID, Start and End time attributes must use the NITE default names: nite:id, nite:start and nite:end.

  • all children and pointers must use XLink / XPointer style links.

  • stream elements will be permitted to contain inadvisably mixed elements (so long as all those elements are valid and defined themselves)