Jean Carletta, Richard Caley and Stephen Isard:
A Collection of Self-Repairs from the Map Task Corpus
As part of a study of message generation in spontaneous speech, we
have collected information about self-repairs in ten dialogues from
the HCRC Map Task Corpus. This document describes the coding scheme
which was used, which (1) identifies where the repairs occur on the
transcripts, (2) describes what material interrupts the message during
editing and how the editing phase relates to the utterance
prosodically, (3) specifies how the major components of the repair are
related to each syntactically and semantically, and (4) suggests what
sort of speaker difficulty is likely to have caused the repair. We
include evidence that the judgment of whether or not a repair has
occurred between two particular words or during a particular move is
reliable. We also include a rough summary of the distribution and
rate of repairs in the dialogues analysed.
(November 1993; 22 pages)
Ref. No. HCRC/TR-47 Price: UKL 1.10
David McKelvie and Henry S. Thompson
TEI-Conformant Structural Markup of a Trilingual Parallel Corpus
In this paper we provide an overview of the ACL European Corpus
Initiative (ECI) Multilingual Corpus 1 (ECI/MC1). In particular, we
look at one particular subcorpus in the ECI/MC1, the trilingual corpus
of International Labour Organisation reports, and discuss the problems
involved in TEI-compliant structural markup and preliminary alignment
of this large corpus. We discuss gross structural alignment down to
the level of text paragraphs. We see this as a necessary first step
in corpus preparation before detailed (possibly automatic) alignment
of texts is possible.
We try and generalise our experience with this corpus to illustrate
the process of preliminary markup of large corpora which in their raw
state can be in an arbitrary format (eg printers tapes, proprietary
word-processor format); noisy (not fully parallel, with structure
obscured by spelling mistakes); full of poorly documented formatting
instructions; and whose structure is present but anything but
explicit. We illustrate these points by reference to other parallel
subcorpora of ECI/MC1. We attempt to define some guidelines for the
development of corpus annotation toolkits which would aid this kind of
structural preparation of large corpora.
(June 1994; 9 pages)
Ref. No. HCRC/TR-48 Price: UKL 0.70
Jochen Dorre and Suresh Manandhar
On constraint-based Lambek calculi
We explore the consequences of layering a Lambek proof system over
an arbitrary (constraint) logic. A simple model-theoretic semantics
for our hybrid language is provided for which a particularly simple
combination of Lambek's and the proof system of the base logic is
complete. Furthermore the proof system for the underlying base logic
can be assumed to be a black box. The essential reasoning needed to be
performed by the black box is that of _entailment checking._ Assuming
feature logic as the base logic entailment checking amounts to a
_subsumption_ test which is a well-known quasi-linear time decidable
problem.
(June 1995; 18 pages)
Ref. No. HCRC/TR-69 Price: UKL 1.00
Henry Thompson, Steve Finch and David McKelvie
The Normalized SGML Library (NSL)
This document describes the Normalised SGML Library (NSL), which
consists of a set of C programs for manipulating \sgml\ files and a C
application program interface (API) designed to ease the writing of C
programs which manipulate \sgml\ documents.
(November 1995; 38 pages)
Ref. No. HCRC/TR-74 Price: UKL 1.40
Peter Yule
A prolog implementation of the Method of Euler Circles for
syllogistic reasoning
This paper presents a prolog implementation of
syllogistic logic, based on the Individual Identification Algorithm
(IIA) (Stenning \& Yule 1996), which has been shown to be relevant to
human performance in syllogistic reasoning tasks. The IIA is isomorphic
to the Method of Euler Circles (Euler 1772), and turns on the
identification of necessary individuals prior to drawing quantified
conclusions. Three variants of the method are presented: (1) Individual
Identification in the case of two premisses, (2) Individual
Identification in the case of multiple premisses, and (3) Drawing
quantified conclusions in the case of two premisses. The complete
prolog code for each variant is included, along with examples.
(May 1996; 17 pages)
Ref. No. HCRC/TR-78 Price: UKL 1.00
Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko,
Gwnyeth Doherty-Sneddon, and Anne Anderson
HCRC Dialogue Structure Coding Manual
Currently, many researchers are using coding of discourse and
dialogue phenomena in collected corpora to study the dynamics of
dialogue. This manual describes a coding system based on utterance
function, game structure, and higher level transaction structure,
which has been applied to a corpus of spontaneous task-oriented
spoken dialogues, the HCRC Map Task Corpus.
(June 1996; 27 pages)
Ref. No. HCRC/TR-82 Price: UKL 1.20
Alan W Black and Paul Taylor
Festival Speech Synthesis System: system documentation (version 1.1.1)
This document provides a user manual for the Festival
Speech Synthesis System, version 1.1.1.
Festival offers a general framework for building speech synthesis
systems as well as including examples of various modules. As a whole
it offers full text to speech through a number APIs: from shell level,
though a Scheme command interpreter, as a C++ library, and an Emacs
interface. Festival is multi-lingual (currently English, Welsh and
Spanish) though English is the most advanced.
The system is written in C++ and uses the Edinburgh Speech Tools
for low level architecture and has a Scheme (SIOD) based command
interpreter for control. Documentation is given in the FSF texinfo
format which can generate, a printed manual, info files and HTML.
The latest details and a full software distribution of the Festival
Speech Synthesis System are available through its
home page.
(January 1997; 160 pages)
Ref. No. HCRC/TR-83 Price: UKL 2.68
David McKelvie
SDP - Spoken Dialogue Parser
This report describes work done on part of
speech tagging and parsing the Map Task corpus in the ``Robust Parsing
and Part-of-Speech Tagging of Transcribed Speech Corpora'' project,
funded by the ESRC (project R000236800). This report concentrates on
the implementation of the software developed in the project and the
format of the SGML annotation of the parse trees. An overview of the
project's aims and results can be found
here and an analysis
of the speech disfluencies found while parsing the corpus can be found
in HCRC/RP-95.
(May 1998; 60 pages)
Ref. No. HCRC/TR-96 Price: UKL ?.??
Jo Calder
Thistle: diagram display engines and editors
Thistle is a novel design for a parameterizable diagram display
engine and editor. This report constitutes the reference
documentation for the system. Instances of the Thistle scheme allow
editing of diagrams associated with many different linguistic theories
and formalisms. This generality arises from the use of a grammar to
describe graphical conventions. The screen appearance of a diagram
may be saved, as may its content for manipulation by other processors.
A wide variety of diagrams are available in the form of
on-line demonstrations.
(July 1998; 20 pages)
Ref. No. HCRC/TR-97 Price: UKL ?.??
Frank Keller, Martin Corley, Steffan Corley, Lars Konieczny, Amalia
Todirascu
WebExp: A Java Toolbox for Web-Based Psychological Experiments
This User's Guide explains the installation and use of WebExp, a set of Java classes for conducting psychological experiments over the Word Wide Web.
The WebExp toolbox consists of two modules: the WebExp server, which is a stand-alone Java application, and the WebExp client, which is implemented as a Java applet. The server application runs on the Web server that hosts the experiment, and waits for client applets to connect to it. The client runs on the machine of the subject taking the experiment. It administers the experiment and connects to the server application to download the experimental stimuli, and to store the subject's responses.
WebExp offers the following features for conducting Web-based experiments:
Further details about WebExp can be found at its webpage.
(July 1998; 17 pages)
Ref. No. HCRC/TR-99 Price: UKL ?.??
Robin Lickley
HCRC Disfluency Coding Manual
This document describes the disfluency coding scheme for the HCRC Map Task Corpus. The coding was done using Entropic Xwaves software with xlabel and aligned with the word-level segmentation of the corpus. The resulting code can been examined with this software or accessed via Xml.
Further details about the HCRC Disfluency Coding Manual can be found at its webpage.
(December 1998; ?? pages)
Ref. No. HCRC/TR-100 Price: UKL ?.??
L. Cahill, C. Doran, R. Evans, C. Mellish, D. Paiva, M. Reape, D. Scott, & N. Tipper
Towards a Reference
Architecture for Natural Language Generation Systems (The RAGS Project)
(April 1999; 54 pages)
Ref. No. HCRC/TR-102 Price: UKL ?.??