HCRC Map Task Corpus XML File Structure
Home (Top of the HCRC Map Task Website)
The XML Map Task corpus consists of a set of linked XML files. For information about hyperlinking see the papers about the Map Task corpus. There are:
- three files which apply to the whole corpus:
- maptask-landmarks.xml Information about all the
landmarks on the maps
- maptask-participants.xml Information about the participants
- maptask-corpus.xml Links to annotation files for each dialogue
- two files which relate to each participant's reading of the
list of landmarks on the maps they have used:
- *.wordlist.xml Transcription of the landmark list
- *.citations.xml Links from landmarks to the citation speech
- the following files which relate to each dialogue:
- *.timed-units.xml Timed Units, one per speaker
- *.tokens.xml Tokens, one per speaker
- *.pos.xml Part of Speech Tags, one per speaker
- *.syn.xml Parse Trees, one per speaker
- *.moves.xml Dialogue Moves, one per speaker
- *.games.xml Dialogue Games, one per dialogue
- *.trans.xml Dialogue Transactions, one per dialogue
- *.gaze.xml Gaze, one per speaker
- *.drawing.xml Drawing, one per dialogue
- *.pr.xml Prosody, one per speaker
- *.landmark-refs.xml Landmark References, one per speaker
This diagram represents the XML files involved
in all the annotation levels listed above, for one dialogue in the
corpus.
- Green boxes represent the files which contain information about
the whole corpus
- F refers to the information follower (see the description of the corpus for details) and G refers to the information giver
- Red boxes represent files which contain timing information
- Blue boxes represent files which contain pointers to other files
pointed at themselves
- An arrow between boxes mean that there can be a link between an
element in the file being pointed from to one or more elements in the file
being pointed to
- Dotted lines pointing to TIME signify that the file has some
direct relation to time, the speech files for each speaker were recorded on separate channels, and XML elements have start and end time attributes which refer to a time offset in seconds in the speech.
- The corpus file has links for each dialogue to all the XML files which pertain to it, these are the ones within the dotted box in the diagram
Last modified: Fri Sep 28 10:24:18 BST 2007