HCRC Map Task Corpus Downloads

Annotations, including transcription

This is the main download page for the HCRC Map Task Corpus Annotations v2.1, which are in the correct format for the NITE XML Toolkit. For ease of download, you can also obtain the accompanying audio and maps from this page. If you have any difficulties with this page, please contact maptask@cogsci.ed.ac.uk.

The HCRC Map Task Corpus Annotations Version 2.1
Human Communication Research Centre
University of Edinburgh & University of Glasgow
Copyright (c) 2007 Human Communication Research Centre

LICENSE:

The downloads on this page are licensed under a Creative Commons Attribution 4.0 International License

The annotations, which include orthographic transcription updated from the version on the original CD-ROMs, come all together in one zip file. The signals are too large to package in this way.

HCRC Map Task Corpus annotation ZIP archive (12MB, 10-02-2011), in NXT format, release version 2.1. (Version 1.0 was in a pre-NXT XML format.) Requires NXT version 1.3.6 or higher and Java 1.4 or higher. The only difference between annotations v2.1 (10-02-2011) and v2.0 (28-09-2007) is the inclusion of path deviation scores for all dialogues.

Maps

The original route giver and follower maps are available as a gzipped tar file for small format maps in GIF format, and as a zip archive for higher resolution maps in postscript format.

Completed route follower maps were photocopied 8 to an A4 page, then those A4 pages were scanned to PDF. This has resulted in a loss of resolution. However you can get these maps either individually by browsing the directory in which they reside, or in a combined zip file. Individual files are named descriptively and within each file, the 4 maps drawn with no eye contact between participants always appear on the left of the page with the eye-contact condition on the right. The order of maps within each half-page is consistent.

We have made the maps available from here for convenience; they are useful for human analysts, but the NITE XML Toolkit doesn't do anything with them.

Text Transcription

We are sometimes asked for text transcripts of Maptask dialogues. These are simply derived using NXT on the annotations above, but we include them here for completeness: the directory in which they reside, or in a combined zip file

Audio

The annotations zip file unzips into a directory called "Data". The NXT metadata file for the annotations tells NXT to expect to find the audio signals in a subdirectory of that called "signals". If you wish to put them somewhere else, edit the <signals> declaration in the metadata file to match the path that you are using. The corpus contains two kinds of audio: dialogue recordings, and recordings of word lists, or "citation forms" of landmark names, for each individual speaker. The audio files on the original Map Task CDs were mono, one per speaker. For the dialogue recordings, these files are stereo mixes of the two mono files in .wav format. This is to make them easier to use in NXT.

This readme file explains some omissions and irregularities in the citation word list audio recordings that are available.

You can get audio files either individually by browsing the directory in which they reside, or by using the form below.

1) select of one or more Map Task dialogues

The dialogue recordings are organized into eight "quads", or sets of eight conversations drawing on four conversants - two familiar pairs - using a Latin squares design. For more information, see the documentation linked from the Map Task website, particularly about the design. There are "eye contact" and "no eye contact" versions [n] of each quad. You can choose recordings by quad, specifying whether you want "eye contact" [e] or "no eye contact" [n] dialogues, or both, and specifying which of the eight conversations you want. [1], [2], [3], [4], [5], [6], [7], [8]. For instance, choosing q1, [e], [n], [2], and [6] will get you recordings of dialogues q1ec2, q1nc2, q1ec6, and q1nc6. You must choose at least one quad and one tickbox in each line in order to indicate any recordings.

The citation word form recordings are also organized by quad, but here there are four recordings, one per speaker. The familiar pairs of conversants are indicated by [a] and [b] and the two conversants within a pair as [1] and [2].

Dialogue Recordings	Citation Word List Recordings
e n c1 c2 c3 c4 c5 c6 c7 c8	e n a b 1 2

2) Press the button

Last modified: Tue Mar 22 16:52:02 GMT 2022