HCRC Map Task Corpus Downloads
Home (Top of the HCRC Map Task Website)
Annotations, including transcription
This is the main download page for the HCRC Map Task Corpus
Annotations v2.1, which are in the correct format for the
NITE XML Toolkit. For ease
of download, you can also obtain the accompanying audio and maps from
this page. If you have any difficulties with this page, please
contact maptask@cogsci.ed.ac.uk.
The HCRC Map Task Corpus Annotations Version 2.1
Human Communication Research Centre
University of Edinburgh & University of Glasgow
Copyright (c) 2007 Human Communication Research Centre
LICENSE:
The downloads on this page are licensed under a Creative Commons Attribution 4.0 International License
|
The annotations, which include orthographic
transcription updated from the version on the original CD-ROMs, come all
together in one zip file. The signals are too large to package in this
way.
-
HCRC
Map Task Corpus annotation ZIP archive (12MB, 10-02-2011), in NXT
format, release version 2.1. (Version
1.0 was in a pre-NXT XML format.) Requires NXT version 1.3.6
or higher and Java 1.4 or higher.
The only difference between annotations v2.1 (10-02-2011) and v2.0
(28-09-2007) is the inclusion of path deviation scores for all
dialogues.
Maps
The original route giver and follower maps are available
as a
gzipped tar file for small format maps in GIF format, and as a
zip
archive for higher resolution maps in postscript format.
Completed route follower maps were photocopied 8 to an A4 page,
then those A4 pages were scanned to PDF. This has resulted in a loss
of resolution. However you can get these maps either individually by
browsing
the directory in which they reside, or in
a combined zip file. Individual files
are named descriptively and within each file, the 4 maps drawn with no
eye contact between participants always appear on the left of the page
with the eye-contact condition on the right. The order of maps within
each half-page is consistent.
We have made the maps available from here for convenience; they are
useful for human analysts, but the NITE XML Toolkit doesn't do
anything with them.
Text Transcription
We are sometimes asked for text transcripts of Maptask
dialogues. These are simply derived using NXT on the annotations
above, but we include them here for
completeness: the directory in which they
reside, or in a combined zip
file
Audio
The annotations zip file unzips into a directory called "Data".
The NXT metadata file for the annotations tells NXT to expect to find
the audio signals in a subdirectory of that called "signals". If you wish
to put them somewhere else, edit the <signals> declaration in
the metadata file to match the path that you are using. The corpus
contains two kinds of audio: dialogue recordings, and recordings of
word lists, or "citation forms" of landmark names, for each individual speaker.
The audio files on the original Map Task CDs were mono, one per speaker.
For the dialogue recordings, these files
are stereo mixes of the two mono files in .wav format.
This is to make them easier to use in NXT.
This readme file explains some omissions and irregularities in the citation word list audio recordings that are available.
You can get audio files either individually by browsing
the
directory in which they reside, or by using the form below.
1) select of one or more Map Task dialogues
The dialogue recordings are organized into eight "quads", or sets of eight
conversations drawing on four conversants - two familiar pairs -
using a Latin squares design.
For more information,
see the documentation linked from the Map Task website, particularly about the design.
There are "eye contact" and
"no eye contact" versions [n] of each quad. You can choose recordings
by quad, specifying whether you want "eye contact" [e] or "no eye contact" [n]
dialogues, or both, and specifying which of the eight conversations you want.
[1], [2], [3], [4], [5], [6], [7], [8]. For instance, choosing q1, [e],
[n], [2], and [6] will get you recordings of dialogues q1ec2, q1nc2,
q1ec6, and q1nc6. You must choose at least one quad and one tickbox in
each line in order to indicate any recordings.
The citation word form recordings are also organized by quad, but here
there are four recordings, one per speaker.
The familiar pairs of conversants are indicated by [a] and [b] and
the two conversants within a pair as [1] and [2].
Last modified: Tue Mar 22 16:52:02 GMT 2022