HCRC Map Task Corpus XML Annotations V1.0

The HCRC Map Task Corpus Annotations Version 1.0
Human Communication Research Centre
University of Edinburgh & University of Glasgow
Copyright (c) 2001 Human Communication Research Centre

LICENCE:

The copyright holder grants to the downloader of these files unrestricted licence to use all the corpus materials (transcription, annotation, tools, documentation) included herein, subject only to the following restriction: the contribution of HCRC is acknowledged in any public presentation or publication of any work based on the corpus.

Initial collection, transcription, annotation and publication supported by Economic and Social Research Council (ESRC), UK -- subsequent work supported by the ESRC and the Engineering and Physical Sciences Research Council (EPSRC), UK under various grants, the University of Edinburgh and the HCRC Language Technology Group.

The HCRC Map Task Corpus Annotations Version 1.0 carries no warranty of any kind.

Since HCRC continues to use the Corpus in our own research, we welcome contact with colleagues engaged in similar projects. For this reason we ask users to notify us at maptask@cogsci.ed.ac.uk as a matter of courtesy of the topic of their intended work with these materials.

A general explanation of the HCRC Map Task Corpus, along with how to obtain audio to accompany these annotations, can be found on the top-level page of this website.

This page contains HCRC's first public release of its annotation of the Map Task Corpus. These annotations include dialogue structure at three levels (moves, games, and transactions), part of speech tags, syntax, gaze, landmark references, and when the participants were using their pens. The annotations are represented in XML using a technique called ``stand-off annotation'' (see Isard, A. (2001) "An XML architecture for the HCRC Map Task Corpus", Proceedings of Bi-Dialog 2001, June 2001, Bielefeld, Germany [PS format] [PDF format]). The annotation release includes updated transcription of the dialogues. Many of the annotations provide pointers to times in the original sound files which allow the speech material to be located easily.

Before using these annotations, we recommend that you consider whether you would prefer to work with the same annotations translated into the format for the NITE XML Toolkit, since they are very similar but come with some graphical user interfaces and a good search facility.

Structure of the XML annotations

Download a gzipped tar file of the entire corpus (tar file is 10MB, whole corpus is 80MB) available by ftp. This tar file contains 2562 XML files and a dtd directory containing 15 dtd files. (In Netscape, if clicking doesn't work, try shift-click).

A gzipped tar file of all the unused giver and follower maps (i.e. the follower has not yet drawn a route) in gif format.

Download a gzipped tar file of one sample dialogue with files which relate to the whole corpus and dtds.

Link to the directory which contains all the XML files (2562) in the tar file. This page can be used to download individual files, or as an address to be used as a reference to individual files.

The Map Task Interface Demo which runs stylesheets over this XML version of the corpus in real time to produce different views of a dialogue depending on the annotation levels which the user wishes to see. There is a brief explanation of what the annotations mean.

Last modified: Fri Sep 28 10:17:40 BST 2007