End user and developer questions for NXT still tend to be dealt with by private email, although we do realize that we should move over to using public forums for this. When we receive a question more than once, we try to make time to change the web pages to make the answer clear in the correct location. This page is for frequently asked questions that haven't yet found a proper home, plus their answers.
A.1. Namespacing | |
Q: |
Exactly what does |
A: | It declares the nite namespace. If you use it in your data, then you have to include this attribute on the root element of the data files that include elements and attributes from this namespace. In NXT format data, users typically namespace the reserved attributes and element names to avoid naming conflicts (e.g., attributes for ids, start and end times, and elements for document roots, out-of-file children, and pointers). |
Q: | Can I use namespacing in my data set? |
A: |
In theory namespacing is a good idea, but there is a bug in NXT's query language parser that means it can't handle
namespaced element names and attributes. For this reason, you should avoid namespacing, with the possible exception
of XML document roots (which aren't available to query anyway) and the reserved attributes that have their own
special meaning to NXT and dedicated query language syntax (the id, available as |
A.2. Fonts and Font Sizes | |
Q: | How do I change the font in an NXT GUI? |
A: | You can do whatever you want in a customized tool. The standard and configurable NXT GUIs don't specify a font, so what you get depends on your java installation. Getting different fonts for different parts of the displayed data requires you to write customized tools or to contribute code to the project that allows the user to specify in the configuration file what font to use for a particular element, attribute, or element's textual content. |
Q: | How do I change the font size in an NXT GUI? |
A: |
You can do whatever you want in a customized tool. The standard and configurable NXT GUIs have a
font size (usually 12 point) wired in, with the exception (at September 2006) of the
The main NXT GUI (
<callable-programs> <callable-program description="20 point GenericDisplay" name="net.sourceforge.nite.gui.util.GenericDisplay"> <required-argument name="corpus" type="corpus"/> <required-argument name="observation" type="observation"/> <required-argument name="fontsize" default="20"/> </callable-program> </callable-programs>
To pop up a window asking the user to enter the fontsize they require, use:
<callable-programs> <callable-program description="20 point GenericDisplay" name="net.sourceforge.nite.gui.util.GenericDisplay"> <required-argument name="corpus" type="corpus"/> <required-argument name="observation" type="observation"/> <required-argument name="fontsize" default="20"/> </callable-program> </callable-programs>
|
A.3. GUIs | |
Q: |
Why is the |
A: |
The |
A.4. Data Model | |
| |
Q: | Are filenames case sensitive? |
A: | Yes. |
Q: | Can I use the same element name in two different layers? |
A: | No. NXT needs each element to belong to exactly one layer because otherwise it doesn't know how to serialize the data set, or what files to load when it requires elements of a specific type. |
Q: | Can I use the same attribute name for two different elements? |
A: | Yes. |
Q: | What kinds of properties can elements inherit from their children? |
A: | Only timing information using the reserved start and end time attributes, and this only if time inheritance is enabled for the element type involved. |
Q: | What are ids for, and what constraints are there on the values for ids? |
A: | An id can be any string that's globally unique. If you are importing data and don't have ids on it yet, you can get NXT to generate ids for you by loading the data and then saving it. Ids are used to manage the relationship between display elements in a GUI and the underlying data, and for specifying out-of-file child and pointer links. |
Q: | Can elements in two structural layers point to each other? |
A: | Yes. In general, any element can point to any other element, as long as all the elements from a given layer point to elements from the same layer, and this relationship is declared in the metadata. Pointers do not have to be in featural layers; the featural layer is just useful conceptually for the kind of layer that only relates to the rest of the data set via pointers. |
A.5. Data Set Design | |
| |
Q: | What if I want elements from one layer to be able to draw children from either some layer or the the layer that layer draws children from, skipping straight to what is usually a grandchild? |
A: |
This violates the NXT data model. Suppose the
The first one is what was designed in as the preferred solution; the others are what data sets usually do. The third one may not be robust against future NXT development. |
Q: | When should I use pointers and when should I use children? |
A: | Use children whenever this is acceptable in the data model (i.e., when it doesn't create loops or require an element to have multiple, conflicting sets of children), turning off the temporal inheritance if you need to - it's much easier to query elements related by hierarchy than by pointer. |
Q: | How much data should I put in one XML file? |
A: | Divide your data into files by thinking about typical uses of the data. If one layer draws children from another, and the two layers always get used together (both within NXT and in external processing), then you can save some loading overhead by putting them in the same file. If, however, users may want one without the other, separate them into two files so that lazy loading can minize the data set size in working memory. If you have an element with many attributes, most of which are rarely used, consider putting the information conveyed by the attributes in one or more files containing elements that use the old, reduced elements as children, or that point to them. This makes querying the rarely used information more cumbersome, but saves overhead in the more common uses. |
Q: | Should I represent my orthography in textual content, or use an attribute? |
A: | The original NXT developers were split between some who wanted to preserve the TEI-ish notion that the textual content is the base text and some who didn't want any privileged textual content at all. Both designs have strengths for different kinds of data sets, so it depends. Most current data sets seem to use textual content. For NXT, textual content has the following special properties:
There are cases where using textual content is less elegant, as, for instance, in parallel corpora, where there are two rival versions of the orthography of equal importance. |
Q: | What's special about ontologies? Can I search for the "top-level" code and get all the child codes? How is it reflected in the underlying data structure? |
A: | Ontologies are a way of providing type or attribute value information that isn't just a string, but where the types or values fit into a hierarchical structure in their own right. Suppose your ontology contains: [ontol.xml] <foo id="id0" name="animal"> <foo id="id1" name="bird"> <foo id="id2" name="sparrow"/> <foo id="id3" name="chickadee"/> </foo> <foo id="id4" name="dog"> <foo id="id5" name="mutt"/> </foo> </foo> Your elements can point into the ontology: <el> <nite:pointer href="ontol.xml#id3"/> </el> to get type information. You can test for chickadees: ($a el)($b foo):($a > $b) && ($b@name="chickadee") but you can also test for birds in general: ($a el)($b foo):($a > $b)::($c foo):($c@name="bird") && ($c ^ $b) Elements in ontologies have searchable relationships just like everything else. In another sense, ontologies aren't at all special, because you could encode the same information as a corpus-resource and still be able to access the information from the query language. Using an ontology is more restrictive because it assume one tag name throughout the hierarchy. |
A.6. Query Language | |
Q: |
Is there a "not dominates" operator, like |
A: |
Use e.g. |
A.7. Performance | |
Q: | What are the memory limits to NXT in loading data? |
A: | The in-memory data representation uses around 7 times the disk storage space for the same data, or a bit less. If lazy loading is on, only the files that are actually needed are loaded. |