The Design of the HCRC Map Task Corpus

Motivation for producing the Map Task Corpus

The HCRC Map Task Corpus was produced in response to one of the core problems of work on natural language: much of our knowledge of language is based on scripted materials, despite most language use taking the form of unscripted dialogue with specific communicative goals. There is, of course, good reason for this. There is no guarantee that the phenomena of theoretical interest will appear with any frequency in naturally occurring speech. Even huge corpora may fail to provide sufficient instances to support any strong claims about the phenomenon under study. In addition there is the problem of context: critical aspects of both linguistic and extralinguistic context may be either unknown or uncontrolled. Prepared materials may lack spontaneity but will be designed to elicit specific examples of linguistic behaviour in controlled conditions and consequently ensure that the particular research needs are met. Our intention, therefore, was to elicit unscripted dialogues in such a way as to boost the likelihood of occurrence of certain linguistic phenomena, and to control some of the effects of context. To this extent while our dialogues are spontaneous, the corpus as a whole comprises a large, carefully controlled elicitation exercise. The choice of variables manipulated in the design illustrates the different interests of the researchers involved in this data gathering effort.

The Map Task design was intended to provide a common corpus for a vertical study of dialogue generating material which can be discussed at levels from the acoustic to the sociolinguistic. All the relevant parameters incorporated in the design are described here. The various forms in which the resulting corpus is available are described on the top-level page of HCRC Map Task Corpus website.

Task Description

The Map Task is a cooperative task involving two participants. The two speakers sit opposite one another and each has a map which the other cannot see. One speaker -- designated the Instruction Giver -- has a route marked on her map; the other speaker -- the Instruction Follower -- has no route. The speakers are told that their goal is to reproduce the Instruction Giver's route on the Instruction Follower's map. The maps are not identical and the speakers are told this explicitly at the beginning of their first session. It is, however, up to them to discover how the two maps differ.

Map Design

All maps consist of landmarks -- or features -- portrayed as line drawings and labelled with their intended name. The differences in the maps result from the systematic manipulation of a design variable we refer to as sharedness: the extent to which features contrast or are shared between pairs of maps. Features were deemed as common if the identical form and label appeared in the identical location on both the Giver's and Follower's map. Features which were not common differed in one of three ways:

Absent/Present features were found on one map but not the other;
Name Change features were identical in form and location but had different labels on the two maps;
2:1 features appeared twice on the Giver's map, once in a position close to the route and once more distant, while the Follower had only the distant irrelevant one.

All map routes begin with a starting point, marked on both maps, and end with a finishing point marked only on the Instruction Giver's map. Both start and end points are adjacent to a common feature but landmarks between these points alternate in sharedness.

This manipulation of mismatches between landmarks enables us to control the information initially shared by the participants.

Since the only constraint on the range of map landmarks is the ease with which the feature can be represented graphically (that is, choice is restricted only by the ingenuity of the artist) we were able to include landmark names of phonological interest. Thus, feature names provided sites for four optional phonological reduction processes:

/t/-deletion eg vast meadow
/d/-deletion eg reclaimed fields
glottalisation eg chestnut tree
nasal assimilation eg broken gate

Landmark names also provided examples of polysyllabic words with differing metrical structure (eg initial S-W words like buffalo and initial W-S words like baboons).

Example information giver's map and matching information follower's map.

Familiarity and Eye-Contact

In addition to the design variables relating to the maps themselves, two other variables were incorporated in the design of the corpus overall.

Subjects are necessarily paired for the task, and since the pairing is under the experimenter's control we were able to vary systematically the familiarity between the participants, by asking subjects to attend with a friend. Each pair of familiar subjects was tested in coordination with another pair who were unknown to either member of the first pair. Two pairs formed a quadruple of subjects who used among them a different set of four map-pairs, with maps being assigned to pairs by Latin Square. Each subject participated in four dialogues, twice as Instruction Giver and twice as Instruction Follower, once in each case with a familiar partner, and once with an unfamiliar partner. As Instruction Giver they gave directions on the same map, but when following they used different maps each time. Half of the subjects gave instructions to a familiar partner first, the others to an unfamiliar partner first.

The option of placing a small barrier between Map Task participants to prevent them from seeing each other's faces allowed us to control the availability of the visual channel for communication. Half of the subjects who took part in the task were able to make eye-contact with their partner, while the other half had no eye-contact.

Procedure

Subjects sat three or four feet apart, facing each other across a desk, with their maps placed on sloping boards, to prevent each subject seeing the other's map. Pairs of subjects were randomly assigned to one of the two ``eye-contact'' conditions.

After they had completed their map dialogues, subjects were asked to read a wordlist containing all the feature names from the set of maps they had encountered. Feature names appeared twice in random order, and subjects were asked to read the list slowly and carefully, aiming for a between word interval of approximately one second. These list readings provided citation forms against which the unscripted dialogue forms could be compared.

Materials were recorded on Digital Audio Tape (Sony DTC1000ES) using one Shure SM10A close-talking microphone and one DAT channel per speaker. Split-screen video recordings were also made for half of the dialogues, capturing an almost full-face image of both subjects. Dialogues were orthographically transcribed and then checked several times against the original DAT recordings.

All sixty-four subjects who participated were undergraduates at the University of Glasgow. Sixty-one of the 64 subjects were Scottish, 56 of them having been born or brought-up within a thirty mile radius of Glasgow. Half the subjects were male, half were female, and their mean age was 20. Subjects accommodated easily to the task and experimental setting, producing unselfconscious and relatively fluent speech.

Experimental Design

The experiment uses a Latin Squares design. Participants were asked to come to the experiment with someone they knew, thus forming familiar pairs. Two pairs make a quad. In the table, a1 came with a2, and b1, with b2.

Index		C	M	ID	Route	Speakers
						Q1-4	G	F
Q1/5	c1/7	+	+	12	1	c1	a1	b1
	c2/8	+	-	9	2	c2	b2	a2
	c3/5	-	+	6	3	c3	a2	a1
	c4/6	-	-	3	4	c4	b1	b2
Q2/6	c1/7	+	+	15	4	c5	a2	b2
	c2/8	+	-	8	1	c6	b1	a1
	c3/5	-	+	5	2	c7	a1	a2
	c4/6	-	-	2	3	c8	b2	b1
						Q5-8	G	F
Q3/7	c1/7	+	+	14	3	c1	a1	a2
	c2/8	+	-	11	4	c2	b2	b1
	c3/5	-	+	4	1	c3	a2	b2
	c4/6	-	-	1	2	c4	b1	a1
Q4/8	c1/7	+	+	13	2	c5	a2	a1
	c2/8	+	-	10	3	c6	b1	b2
	c3/5	-	+	7	4	c7	a1	b1
	c4/6	-	-	0	1	c8	b2	a2

C = Contrast; M = Match; ID = Campus Interface(?) ID

Familiarity

Quads 1-4	c 1,2,5,6	Unfamiliar
	c 3,4,7,8	Familiar
Quads 5-8	c 1,2,5,6	Familiar
	c 3,4,7,8	Unfamiliar

Measuring task performance

The main measure of task performance that has been used for the Map Task is in terms of how far the route that the follower has drawn deviates from the route shown on the giver's map. To reconstruct it, using the original A3 size maps, trace the giver's route on acetate marked with a one centimetre square grid, and impose it over the follower's map. The deviation score is the number of squares between the two routes. You can find pre-computed scores here, or, in future releases, in the NXT-format corpus resource that gives conversation-level information. This text file describes an alternative possible measure. The method was first described in print by A. H. Anderson, A. Clark, and J. Mullin (1991) Introducing information in dialogues: How young speakers refer and how young listeners respond. Journal of Child Language, 18, 663-687.

Some Corpus Statistics

The HCRC Map Task Corpus consists of 128 digitally recorded unscripted dialogues and 64 citation form readings of lists of landmark names. All dialogues were transcribed verbatim in standard orthography, including (where possible) filled pauses, false starts, hesitations, repetitions and interruptions.

Last modified: Fri Sep 12 15:04:13 BST 2008