Abstracts of HCRC Technical Reports

E. A. Boyle:
User's Guide to the HCRC Dialogue Database

This booklet is intended as a guide to those who want to use the HCRC dialogue database. It outlines who and what the database is for, what is stored on the database and how to access and use the database.
(September 1990; 30 pages)
Ref. No. HCRC/TR-11 Price: UKL 1.30

Jean Carletta, Richard Caley and Stephen Isard:
A Collection of Self-Repairs from the Map Task Corpus

As part of a study of message generation in spontaneous speech, we have collected information about self-repairs in ten dialogues from the HCRC Map Task Corpus. This document describes the coding scheme which was used, which (1) identifies where the repairs occur on the transcripts, (2) describes what material interrupts the message during editing and how the editing phase relates to the utterance prosodically, (3) specifies how the major components of the repair are related to each syntactically and semantically, and (4) suggests what sort of speaker difficulty is likely to have caused the repair. We include evidence that the judgment of whether or not a repair has occurred between two particular words or during a particular move is reliable. We also include a rough summary of the distribution and rate of repairs in the dialogues analysed.
(November 1993; 22 pages)
Ref. No. HCRC/TR-47 Price: UKL 1.10

David McKelvie and Henry S. Thompson
TEI-Conformant Structural Markup of a Trilingual Parallel Corpus

In this paper we provide an overview of the ACL European Corpus Initiative (ECI) Multilingual Corpus 1 (ECI/MC1). In particular, we look at one particular subcorpus in the ECI/MC1, the trilingual corpus of International Labour Organisation reports, and discuss the problems involved in TEI-compliant structural markup and preliminary alignment of this large corpus. We discuss gross structural alignment down to the level of text paragraphs. We see this as a necessary first step in corpus preparation before detailed (possibly automatic) alignment of texts is possible.
We try and generalise our experience with this corpus to illustrate the process of preliminary markup of large corpora which in their raw state can be in an arbitrary format (eg printers tapes, proprietary word-processor format); noisy (not fully parallel, with structure obscured by spelling mistakes); full of poorly documented formatting instructions; and whose structure is present but anything but explicit. We illustrate these points by reference to other parallel subcorpora of ECI/MC1. We attempt to define some guidelines for the development of corpus annotation toolkits which would aid this kind of structural preparation of large corpora.
(June 1994; 9 pages)
Ref. No. HCRC/TR-48 Price: UKL 0.70

Jochen Dorre and Suresh Manandhar
On constraint-based Lambek calculi

We explore the consequences of layering a Lambek proof system over an arbitrary (constraint) logic. A simple model-theoretic semantics for our hybrid language is provided for which a particularly simple combination of Lambek's and the proof system of the base logic is complete. Furthermore the proof system for the underlying base logic can be assumed to be a black box. The essential reasoning needed to be performed by the black box is that of _entailment checking._ Assuming feature logic as the base logic entailment checking amounts to a _subsumption_ test which is a well-known quasi-linear time decidable problem.
(June 1995; 18 pages)
Ref. No. HCRC/TR-69 Price: UKL 1.00

Henry Thompson, Steve Finch and David McKelvie
The Normalized SGML Library (NSL)

This document describes the Normalised SGML Library (NSL), which consists of a set of C programs for manipulating \sgml\ files and a C application program interface (API) designed to ease the writing of C programs which manipulate \sgml\ documents.
(November 1995; 38 pages)
Ref. No. HCRC/TR-74 Price: UKL 1.40

Peter Yule
A prolog implementation of the Method of Euler Circles for syllogistic reasoning

This paper presents a prolog implementation of syllogistic logic, based on the Individual Identification Algorithm (IIA) (Stenning \& Yule 1996), which has been shown to be relevant to human performance in syllogistic reasoning tasks. The IIA is isomorphic to the Method of Euler Circles (Euler 1772), and turns on the identification of necessary individuals prior to drawing quantified conclusions. Three variants of the method are presented: (1) Individual Identification in the case of two premisses, (2) Individual Identification in the case of multiple premisses, and (3) Drawing quantified conclusions in the case of two premisses. The complete prolog code for each variant is included, along with examples.
(May 1996; 17 pages)
Ref. No. HCRC/TR-78 Price: UKL 1.00

Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwnyeth Doherty-Sneddon, and Anne Anderson
HCRC Dialogue Structure Coding Manual

Currently, many researchers are using coding of discourse and dialogue phenomena in collected corpora to study the dynamics of dialogue. This manual describes a coding system based on utterance function, game structure, and higher level transaction structure, which has been applied to a corpus of spontaneous task-oriented spoken dialogues, the HCRC Map Task Corpus.
(June 1996; 27 pages)
Ref. No. HCRC/TR-82 Price: UKL 1.20

Alan W Black and Paul Taylor
Festival Speech Synthesis System: system documentation (version 1.1.1)

This document provides a user manual for the Festival Speech Synthesis System, version 1.1.1.
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, and an Emacs interface. Festival is multi-lingual (currently English, Welsh and Spanish) though English is the most advanced.
The system is written in C++ and uses the Edinburgh Speech Tools for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.
The latest details and a full software distribution of the Festival Speech Synthesis System are available through its home page.
(January 1997; 160 pages)
Ref. No. HCRC/TR-83 Price: UKL 2.68

David McKelvie
SDP - Spoken Dialogue Parser

This report describes work done on part of speech tagging and parsing the Map Task corpus in the ``Robust Parsing and Part-of-Speech Tagging of Transcribed Speech Corpora'' project, funded by the ESRC (project R000236800). This report concentrates on the implementation of the software developed in the project and the format of the SGML annotation of the parse trees. An overview of the project's aims and results can be found here and an analysis of the speech disfluencies found while parsing the corpus can be found in HCRC/RP-95.

(May 1998; 60 pages)
Ref. No. HCRC/TR-96 Price: UKL ?.??

Jo Calder
Thistle: diagram display engines and editors

Thistle is a novel design for a parameterizable diagram display engine and editor. This report constitutes the reference documentation for the system. Instances of the Thistle scheme allow editing of diagrams associated with many different linguistic theories and formalisms. This generality arises from the use of a grammar to describe graphical conventions. The screen appearance of a diagram may be saved, as may its content for manipulation by other processors. A wide variety of diagrams are available in the form of on-line demonstrations.

(July 1998; 20 pages)
Ref. No. HCRC/TR-97 Price: UKL ?.??

Frank Keller, Martin Corley, Steffan Corley, Lars Konieczny, Amalia Todirascu
WebExp: A Java Toolbox for Web-Based Psychological Experiments

This User's Guide explains the installation and use of WebExp, a set of Java classes for conducting psychological experiments over the Word Wide Web.

The WebExp toolbox consists of two modules: the WebExp server, which is a stand-alone Java application, and the WebExp client, which is implemented as a Java applet. The server application runs on the Web server that hosts the experiment, and waits for client applets to connect to it. The client runs on the machine of the subject taking the experiment. It administers the experiment and connects to the server application to download the experimental stimuli, and to store the subject's responses.

WebExp offers the following features for conducting Web-based experiments:

Further details about WebExp can be found at its webpage.

(July 1998; 17 pages)
Ref. No. HCRC/TR-99 Price: UKL ?.??

Robin Lickley
HCRC Disfluency Coding Manual

This document describes the disfluency coding scheme for the HCRC Map Task Corpus. The coding was done using Entropic Xwaves software with xlabel and aligned with the word-level segmentation of the corpus. The resulting code can been examined with this software or accessed via Xml.

Further details about the HCRC Disfluency Coding Manual can be found at its webpage.

(December 1998; ?? pages)
Ref. No. HCRC/TR-100 Price: UKL ?.??

L. Cahill, C. Doran, R. Evans, C. Mellish, D. Paiva, M. Reape, D. Scott, & N. Tipper
Towards a Reference Architecture for Natural Language Generation Systems (The RAGS Project)

(April 1999; 54 pages)
Ref. No. HCRC/TR-102 Price: UKL ?.??