The Design of the HCRC Map Task Corpus
Home (Top of the HCRC Map Task Website)
Motivation for producing the Map Task Corpus
The HCRC Map Task Corpus was produced in response to one of the core
problems of work on natural language: much of our knowledge of
language is based on scripted materials, despite most language use
taking the form of unscripted dialogue with specific communicative
goals. There is, of course, good reason for this. There is no
guarantee that the phenomena of theoretical interest will appear with
any frequency in naturally occurring speech. Even huge corpora may
fail to provide sufficient instances to support any strong
claims about the phenomenon under study. In addition there is the
problem of context: critical aspects of both linguistic and
extralinguistic context may be either unknown or uncontrolled.
Prepared materials may lack spontaneity but will be designed to elicit
specific examples of linguistic behaviour in controlled conditions and
consequently ensure that the particular research needs are met. Our
intention, therefore, was to elicit unscripted dialogues in such a way
as to boost the likelihood of occurrence of certain linguistic
phenomena, and to control some of the effects of context. To this
extent while our dialogues are spontaneous, the corpus as a whole
comprises a large, carefully controlled elicitation exercise. The
choice of variables manipulated in the design illustrates the
different interests of the researchers involved in this data gathering
effort.
The Map Task design was intended to provide a
common corpus for a vertical study of dialogue
generating material
which can be discussed at levels from the acoustic to the
sociolinguistic. All the relevant parameters incorporated in the
design are described here. The various forms in which the resulting
corpus is available are described on the
top-level page of HCRC Map Task Corpus website.
The Map Task
is a cooperative task involving two participants.
The two speakers sit opposite one another and each has a map which the
other cannot see.
One speaker -- designated the Instruction Giver --
has a route marked on her map;
the other speaker -- the Instruction Follower -- has no route.
The speakers are told that their goal is to reproduce the Instruction
Giver's route on the Instruction Follower's map.
The maps are not identical and the speakers are told this explicitly
at the beginning of their first session.
It is, however, up to them to discover how the two maps differ.
Map Design
All maps consist of landmarks -- or features --
portrayed as line drawings and labelled with their intended name.
The differences in the maps result from the systematic manipulation of
a design variable we refer to as sharedness:
the extent to which features contrast or are shared between pairs of
maps.
Features were deemed as common if the
identical form and label appeared in the identical location on both
the Giver's and Follower's map.
Features which were not common differed in one of three ways:
- Absent/Present features were found on one map but not the other;
- Name Change features were identical in form and location but
had different labels on the two maps;
- 2:1 features appeared twice on the Giver's map, once in a
position close to the route and once more distant, while the
Follower had only the distant irrelevant one.
All map routes begin with a starting point, marked on both maps, and
end with a finishing point marked only on the Instruction Giver's map.
Both start and end points are adjacent to a common feature
but landmarks between these points alternate in sharedness.
This manipulation of mismatches between landmarks enables us to
control the information initially shared by the participants.
Since the only constraint on the range of map landmarks is the ease
with which the feature can be represented graphically
(that is, choice is restricted only by the ingenuity of the artist)
we were able to include landmark names of phonological interest.
Thus, feature names provided sites for four optional phonological
reduction processes:
- /t/-deletion eg vast meadow
- /d/-deletion eg reclaimed fields
- glottalisation eg chestnut tree
- nasal assimilation eg broken gate
Landmark names also provided examples of polysyllabic words with
differing metrical structure (eg initial S-W words like buffalo
and initial W-S words like baboons).
Example information giver's map and matching information follower's map.
Familiarity and Eye-Contact
In addition to the design variables relating to the maps themselves,
two other variables were incorporated in the design of the corpus
overall.
Subjects are necessarily paired for the task, and since the pairing is
under the experimenter's control we were able to vary systematically
the familiarity
between the participants, by asking subjects to attend
with a friend. Each pair of familiar subjects was tested in coordination
with another pair who were unknown to either member of the first pair.
Two pairs formed a quadruple of subjects who used among them a
different set of four map-pairs, with maps being assigned to pairs by
Latin Square.
Each subject participated in four dialogues, twice as Instruction
Giver and twice as Instruction Follower, once in each case with a
familiar partner, and once with an unfamiliar partner.
As Instruction Giver they gave directions on the same map, but when
following they used different maps each time.
Half of the subjects gave instructions to a familiar partner first,
the others to an unfamiliar partner first.
The option of placing a small barrier between Map Task participants to
prevent them from seeing each other's faces allowed us to control the
availability of the visual channel for communication.
Half of the subjects who took part in the task were able to make
eye-contact
with their partner,
while the other half had no eye-contact.
Procedure
Subjects sat three or four feet apart, facing each other across a
desk, with their maps placed on sloping boards, to prevent each
subject seeing the other's map.
Pairs of subjects were randomly assigned to one of the two ``eye-contact''
conditions.
After they had completed their map dialogues, subjects were asked to
read a wordlist containing all the feature names from the set of maps
they had encountered. Feature names appeared twice in random
order, and subjects were asked to read the list slowly and carefully,
aiming for a between word interval of approximately one second.
These list readings provided citation forms against which the
unscripted dialogue forms could be compared.
Materials were recorded on Digital Audio Tape (Sony DTC1000ES)
using one Shure SM10A close-talking microphone and one DAT channel per
speaker. Split-screen video recordings were also made for half of the
dialogues, capturing an almost full-face image of both subjects.
Dialogues were orthographically transcribed and then checked several
times against the original DAT recordings.
All sixty-four subjects who participated were undergraduates at the
University of Glasgow.
Sixty-one of the 64 subjects were Scottish, 56 of them having been
born or brought-up within a thirty mile radius of Glasgow.
Half the subjects were male, half were female, and their mean age was 20.
Subjects accommodated easily to the task and experimental setting,
producing unselfconscious and relatively fluent speech.
Experimental Design
The experiment uses a Latin Squares design. Participants were asked to come
to the experiment with someone they knew, thus forming familiar pairs. Two
pairs make a quad. In the table, a1 came with a2, and b1, with b2.
Index | C | M | ID |
Route | Speakers |
| Q1-4 | G | F |
Q1/5 | c1/7 | + | + | 12 | 1 | c1 | a1 | b1 |
| c2/8 | + | - | 9 | 2 | c2 | b2 | a2 |
| c3/5 | - | + | 6 | 3 | c3 | a2 | a1 |
| c4/6 | - | - | 3 | 4 | c4 | b1 | b2 |
Q2/6 | c1/7 | + | + | 15 | 4 | c5 | a2 | b2 |
| c2/8 | + | - | 8 | 1 | c6 | b1 | a1 |
| c3/5 | - | + | 5 | 2 | c7 | a1 | a2 |
| c4/6 | - | - | 2 | 3 | c8 | b2 | b1 |
| Q5-8 | G | F |
Q3/7 | c1/7 | + | + | 14 | 3 | c1 | a1 | a2 |
| c2/8 | + | - | 11 | 4 | c2 | b2 | b1 |
| c3/5 | - | + | 4 | 1 | c3 | a2 | b2 |
| c4/6 | - | - | 1 | 2 | c4 | b1 | a1 |
Q4/8 | c1/7 | + | + | 13 | 2 | c5 | a2 | a1 |
| c2/8 | + | - | 10 | 3 | c6 | b1 | b2 |
| c3/5 | - | + | 7 | 4 | c7 | a1 | b1 |
| c4/6 | - | - | 0 | 1 | c8 | b2 | a2 |
C = Contrast; M = Match; ID = Campus Interface(?) ID
Familiarity
Quads 1-4 | c 1,2,5,6 | Unfamiliar |
| c 3,4,7,8 | Familiar |
Quads 5-8 | c 1,2,5,6 | Familiar |
| c 3,4,7,8 | Unfamiliar |
Measuring task performance
The main measure of task performance that has been used for the
Map Task is in terms of how far the route that the follower has
drawn deviates from the route shown on the giver's map. To reconstruct
it, using the original A3 size maps, trace the giver's route on acetate
marked with a one centimetre square grid, and impose it over the follower's
map. The deviation score is the number of squares between the
two routes. You can find pre-computed scores
here, or, in future releases, in the
NXT-format corpus resource that gives conversation-level
information. This text file describes
an alternative possible measure.
The method was first described in print
by A. H. Anderson, A. Clark, and J. Mullin (1991)
Introducing information in dialogues: How young speakers refer and how young
listeners respond. Journal of Child Language, 18, 663-687.
Some Corpus Statistics
The HCRC Map Task Corpus consists of 128 digitally recorded unscripted
dialogues and 64 citation form readings of lists of landmark names.
All dialogues were transcribed verbatim in standard orthography,
including (where possible) filled pauses, false starts, hesitations,
repetitions and interruptions.
Last modified: Fri Sep 12 15:04:13 BST 2008