architecture guide

GARM acts on file collections. At its simplest, a GARM install operation simply copies files from one collection to another, while a GARM uninstall operation removes the files in one collection from the other. GARM's power comes from the ability to form layers of collections, with each successive collection adding some behaviour or functionality to the one one below it. These chains of collections are capable of representing very complex semantics.

When adding functionality to GARM, this functionality is usually best encapsulated by creating another layer, and then instructing GARM to use that layer with certain types of collections. Chains of layers are identified by the user with references, and GARM assumes the responsibility of turning a given reference (string) into a suitable collection.

The other aspect of GARM's handling of collections is to do with meta-data. When GARM was written, the precise mechanism that would be used to store meta-data about collections was not clear, so this functionality is separated out from individual collections themselves, which only interact with a PersistenceMechanism interface. This mechanism both saves new collections from having to implement their own persistence scheme, and enforces a degree of consistency across the application. If at some later point the current (XML based) persistence scheme should prove undesirable, all the persistence functionality is encapsulated in one place.

The GARM class itself offers a single point of configuration, shared between all executables in the GARM suite. As the suite evolves, more and more of the decisions made in this class are likely to be taken from system properties rather than being hardcoded. This class is where concrete implementations of each of the many interfaces present in the suite are created and shared.

File Collections and VFiles

Central to everything in GARM are FileCollections and their associated VFiles. A VFile can be thought of as the abstraction of a file away from its existence on the local filesystem. This enables many different sources of data to be modelled as files, for example entries in an archive and files on a local directory, not to mention elements in an XML descriptor. A VFile provides only a few basic options, the most important being to identify its name, type, hierarchy, modification date, and to retrieve its data as an InputStream.

The type and hierarchy fields of a VFile merit more discussion. VFiles may be either typed or untyped. For simplicity, untyped VFiles return their type as FileType.UNKNOWN. This type information represents the information that GARM can deduce about this file from its position in some directory structure, its name, or some other criterion. The LayoutCollection layer will take a collection of untyped VFiles and expose a collection of VFiles typed according to some FileLayout.

For an untyped VFile, the hierarchy field represents the relative path to the directory containing the VFile within the collection. This is best explained by an example. Suppose we have a directory "/usr/foo" containing a file "/usr/foo/bar/goo/zoom.cfg" then the directory would expose this as a VFile with the name "zoom.cfg" and the hierarchy "bar/goo/". When the VFile is typed, the hierarchy is slightly different in that it is now relative to the root for that type in the collection. Taking the same example, if the directory is now covered by a LayoutCollection identifying all files in "bar/" as FileType.LIBRARY then the same file will be exposed with name "zoom.cfg", type FileType.LIBRARY and hierarchy "goo/".

References and Reference Spaces

A reference is a way for the user to describe a collection to GARM. It is the job of a ReferenceSpace within GARM to take a reference string and create a suitable collection for it. In practice, it is also important that the reference space only create a single collection chain for each distinct reference to preserve consistency.

The default reference space takes a Map argument to its constructor specifying the mappings from reference type identifiers to CollectionBuilder objects. Each reference in this space has the form type:identifier. The type part may be any string (not including a colon of course), while identifier may be any free format string. The identifier is passed "as is" to the collection builder identified by the type part. In practice, the identifier is used to hold the data needed for the collection builder to create the root of the chain. The other layers of the chain are configured when the CollectionBuilder objects themselves are created.

For example take a reference to a GAR on the local filesystem. In the current version of GARM, this reference has the form local-gar:/usr/foo/bar.gar. The default reference space is initialised with a map containing a key "local-gar" and an associated value of a GARBuilder object. This builder will be passed "/usr/foo/bar.gar", which it will first pass to an ArchiveMaker (see later) to obtain a collection representing the archive. This collection will then have stacked on top of it a LayoutCollection with a GARLayout as a parameter. The resulting collection is typed and the streams returned by its files will be ZIPInputStreams which will be created from the archive on disk.

The exact mechanism used to create a collection from a reference will depend upon many factors, including the reference space in use by GARM (currently configurable only at compile time), the collection builders installed (assuming the default reference space is used, other reference spaces might have other mechanisms) and the reference itself.

Meta-data Persistence

As mentioned in the overview, GARM separates out the task of meta-data persistence from the individual file collection classes. Instead, each class needing it takes a PersistenceMechanism argument to its constructor, which it can use to load its meta-data. Since the type of the meta data is likely to be dependent on the class of the file collection, this generates a relatively high degree of coupling. It has the advantage of making the choice between persistence mechanisms at a later date easier to manage.

A persistence mechanism implements just two methods, one to save the files for a given collection and one to load them. There is an implicit assumption that all the collection's meta data is stored in VFile derived objects. This is likely to be true in general since collections are just collections of VFiles and should not store data unrelated to any file in them. In order to identify the correct meta-data, the persistence mechanism is also provided with the reference for the chain the collection forms a part of. This is so that, for example, DependencyCollections can be used on two unrelated containers on the same machine.

The persistence mechanism used in the initial version of GARM is based on the idea of XML files. Each collection stores its data in an XML file in a configurable storage directory. Using the combination of the collection class and the chain reference, a unique and reproducible filename within the storage directory is derived for each persistence request. This file is passed to an XMLNode for the specific collection class, which parses it and returns an array of VFiles.

The Standard Collection Types

The collections in the first release of GARM fall into two categories, concrete collections and meta collections. The intent at least when creating these two types was that concrete collections provided real files and streams and were a simple view of something existing on the system. Concrete collections like these are the Directory and ZIPArchive classes. Less intuitively, the LayoutCollection is also a concrete collection because it creates an entirely new view of the collection below it (by adding type information).

The meta-collections present in the first release are more broad, and even less strongly related. In principle a meta collection does not create a new view of the files available, it merely stores information about the changes in lower levels and provides additional behaviours to structure those changes. The currently existing meta-collections are the PolicyCollection, GeneratedCollection, DependentCollection and HistoryCollection.

The Directory collection creates a file collection derived from a directory in the local file system. It searches the given directory recursively adding every file in it, and naming itself as the source. If such a file is read using stream() it returns a new FileInputStream from the file in question. In a similar way, the ZIPArchive collection provides a set of files derived from reading the contents of a ZIPArchive on the local disk. When one of its files is stream()ed it returns a new ZIPInputStream() for the relevant ZIP entry.

The LayoutCollection actually performs two functions. Firstly, it classifies the types of files in an underlying collection and exposes a typed collection derived from it. When one of these files is opened with stream() the stream method of the underlying untyped file will be called. Secondly, it stores origin information for each file added to the system in the form of a reference to the collection from which the file came. This can be used to reconstruct a collection which has been damaged (providing the file in question was not installed outside of GARM, and hence registered as coming from the underlying collection anyway). The decision was taken not to split this dual function layer into two other layers since the split when made was overly artificial and to no real purpose.

The PolicyCollection takes a Policy argument specifying when a new file should overwrite an old file. The current policies are FreshenPolicy (overwrite if older), ReplacePolicy (always overwrite), PreservePolicy (never overwrite). The policy collection interface will shortly be expanded to allow policies to be specified by types of file and a default. This may occur before the first release or soon thereafter.

A GeneratedCollection takes a Generator argument which will silently squash attempts to copy certain files into the underlying collection, and instead use them to create new files. These files are always marked as FileType.GENERATED. This is the mechanism by which GARM deploys ".wsdd" files: a generator object intercepts the standard "server-deploy.wsdd" and "client-deploy.wsdd" files and uses them to recreate "server-config.wsdd" and "client-config.wsdd".

The DependentCollection and HistoryCollection classes provide the most useful features of GARM: dependency tracked uninstallation and rollback. Dependent collections will not allow a file to be removed from the collection if files from another source failed to overwrite it and have not yet been removed. Equally importantly, history collections will install an older version of a file that had been overwritten if its newer counterpart is uninstalled. Together these provide dependency tracking and rollback. There is a caveat in the implementation of DependentCollection that it must maintain dependency information for VFiles from all sources, even those not currently present in the collection (but which were in the collection) so that dependency information is not lost on rollback.

Descriptors, Remote Archives and ArchiveMakers

A major problem with a naive GARM implementation which simply uses an exported NFS mount to store GAR files is the volume of data transfer which must occur every time GARM is run. In order to allow for GARS to be upgraded, it is necessary to check the modification dates and sometimes even contents of individual GAR files. These tests would naively require almost every GAR to be downloaded across a network link every time GARM was run.

To avoid this problem, GARM uses a system of XML descriptors which list the important data about a GAR file in a much more succinct form. Currently such descriptors store only modification times for the files in the GAR and the GAR itself, together with the names of the files in the GAR and a URL to obtain the GAR from. Typically, the size of such descriptors is under 1k.

When using a RemoteArchive collection in GARM, each the descriptor is immediately downloaded when the archive is created. After that, the GAR itself will only be downloaded when an attempt is made to stream() one of its files. At this point it will download the GAR from the given URL, use an ArchiveMaker to construct a suitable archive from the result and then act as a proxy to the new archive object. There is an assumption that the descriptor accurately reflects the GAR which may lead to inconsistencies if it does not.

ArchiveMakers provide a simple and extensible way to create an Archive collection from a file. They may use any aspect of the file to determine what compression method it uses, and expose the file as a collection of VFiles. The default implementation only recognises ".gar" files as ZIP compressed archives, but this is sufficient for all existing GARs. In practice, abstracting this out may have been somewhat paranoid, but should the compression method or extensions used by GARS ever change, GARM will be well placed to deal with it. One possible application would be to use ".tar.bz2" files as GARs for better compression ratios.

The GARM class itself

The GARM class represents the place where all the running decisions that affect the configuration of the GARM suite are made. In time, most of this will be accessible via Java system properties, but at present most are hard coded. GARM provides global access to a ReferenceSpace (so that all references used in a single tool invocation are consistent), an ArchiveMaker (for similar reasons), and the URL of the repository GARM is using for remote accesses.

The only other functions provided by GARM are standardised install() and uninstall() methods. These accept both FileCollection and String (reference) arguments.