Mimo Caenepeel
Putting while in context
This paper gives an analysis of the connective while and the way
it functions in context, based on a study of a set of while-clauses in
two different text types. I propose to view while as one
connective with a basic meaning and a number of different surface
realizations. I show that different interpretations of
while are subject to constraints coming from different
directions, and that in determining how a particular interpretation
arises, attributes of the sentence as well as its context need to be
taken into account. In the proposed approach, syntactic structure,
semantic meaning and discourse function are closely intertwined.
(May 1997: 34 pages)
Ref. No. HCRC/RP-85 Price: UKL 0.90
Renata Vieira & Massimo Poesio
Corpus-based processing of definite descriptions
To appear in Botley and McEnery eds., "Corpus-based
and computational approaches to anaphora".
We discuss in this paper a system that resolves definite descriptions
in written texts. A preliminary study of definite descriptions in a
collection of 20 texts revealed that about 30% of the 1040 definites
in the collection were cases of anaphoric definites whose antecedents
had the same head noun, and 50% introduced novel discourse
referents. An algorithm which resolves anaphoric definite descriptions
and also identifies novel ones is proposed. We evaluated the algorithm
by comparing its results with an annotation produced by human
subjects. The analysis of the corpus, the implemented algorithm, and
the evaluation of the results are presented in this paper.
(May 1997: 15 pages)
Ref. No. HCRC/RP-86 Price: UKL 0.66
Massimo Poesio, Renata Vieira and Simone Teufel
Resolving bridging descriptions in unrestricted texts
Our goal is to develop a system capable of treating the largest
possible subset of definite descriptions in unrestricted written
texts. A previous prototype resolved anaphoric uses of definite
descriptions and identified some types of first-mention uses,
achieving a recall of 56%. In this paper we present the latest
version of our system, which handles some types of bridging references,
uses WordNet as a source of lexical knowledge, and achieves a recall
of 65%.
(May 1997: 6 pages)
Ref. No. HCRC/RP-87 Price: UKL 0.54
Robert Inder (University of Edinburgh),
Dick Bulterman (CWI),
Pedro Basagoiti (Software AG Espana),
Rob Cannell (Implex Environmental Systems Limited),
Mandy Haggith (University of Edinburgh),
Peter Kullmann (University of Karlsruhe),
Bill Ritchie (Assynt Crofters Trust),
Isabel Serrasolsas (CEAM)
Sustainable Telematics for Environmental Management (STEM final report)
This is the final report from STEM, a project funded in the Environment
sector of the Telematics Applications Programme of the European Union.
STEM's overall aim is to provide easy-to-use telematics tools to improve
the information available to land managers in making decisions. This
project was a one-year study to demonstrate the feasibility of constructing
a telematics application to help them access accurate and up-to-date
information It was driven by the requirements of two groups of users with
responsibility for environmental planning and management in remote and
environmentally sensitive Objective 1 areas of Europe. Work closely
followed the original project plan. Initial requirements capture confirmed
the demand for a tool like STEM, and documented the required functionality.
An outline architecture for an open, extensible system to meet those needs
was then designed. This is based on an environment for creating and
sharing sets of configurable, domain specific facilities for accessing,
combining and appropriately presenting information for a particular
purpose. Finally, over 50 issues affecting the feasibility of building
such a system were identified and assessed. This showed that there will be
technical challenges in bringing together the range of software components
needed, and some formidable administrative and organisational obstacles to
be overcome. Nevertheless, our overall conclusion is that it is feasible
to build a working telematics system which will be a valuable and powerful
tool to assist in environmental management.
(May 1997: 38 pages)
Ref. No. HCRC/RP-88 Price: UKL 0.95
Claire Hewson, Richard Cox & Keith Stenning
A study of Statistical Process Control training needs in small and medium UK companies.
This report examines the implementation of Statistical Process Control
(SPC) in a sample of six UK companies. Practices, problems, and
training procedures are identified and compared in order to assess
training needs. Recommendations are made for effective SPC training
and a prototype training package is described.
(June 1997: 58 pages)
Ref. No. HCRC/RP-89 Price: UKL 1.90
Korin Richmond, Andrew Smith & Einat Amitay
Detecting Subject Boundaries within Text: A Language Independent
Statistical Approach
We describe here an algorithm for detecting subject boundaries
within text, based on a statistical lexical similarity measure. Hearst
has already tackled this problem with good results (Hearst, 1994). One
of her main assumptions is that a change in subject is accompanied by
a change in vocabulary. Using this assumption, but by introducing a
new measure of word significance, we have been able to build a robust
and reliable algorithm which exhibits improved accuracy without
sacrificing language independency.
(June 1997: 8 pages)
Ref. No. HCRC/RP-90 Price: UKL 0.70
Steven Bird
When Marking Tone Reduces Fluency: An Orthography Experiment in Cameroon
Should an alphabetic orthography for a tone language include tone
marks? Opinion and practice are divided along three lines: zero
marking, phonemic marking and various reduced marking schemes. This
paper examines the success of phonemic tone marking for Dschang, a
Grassfields Bantu language which uses tone to distinguish lexical
items and some grammatical constructions.
Participants with a variety of ages and educational backgrounds,
and having different levels of exposure to the orthography were tested
on location in the Western Province of Cameroon. All but one had
attended classes on tone marking. Participants read texts which were
marked and unmarked for tone, then added tone marks to the unmarked
texts. Analysis shows that tone marking degrades reading fluency and
does not help to resolve tonally ambiguous words. Experienced writers
attain an accuracy score of 83.5% in adding tone marks to a text,
while inexperienced writers score a mere 53%, which is not much
better than chance. A reduced tone orthography is briefly described,
and early indications are that it will out-perform both tone
orthographies tested here. A detailed survey of other experimental
work on African tone orthography lays the groundwork for the
experiment.
(June 1997: 35 pages)
Ref. No. HCRC/RP-91 Price: UKL 1.30
Massimo Poesio & David R. Traum
Conversational Actions and Discourse Situations
We use the idea that actions performed in a conversation become part
of the common ground as the basis for a model of context that
reconciles in a general and systematic fashion the differences
between the theories of discourse context used for reference
resolution, intention recognition, and dialogue management. We start
from the treatment of anaphoric accessibility developed in DRT, and
we show first how to
obtain a discourse model that, while preserving DRT's basic ideas
about referential accessibility, includes information about the
occurrence of speech acts and their relations. Next, we show how the
different kinds of `structure' that play a role in
conversation---discourse segmentation, turn-taking, and
grounding---can be formulated in terms of information about speech
acts, and use this same information as the basis for a model of the
interpretation of fragmentary input.
To appear in Computational Intelligence.
(September 1997: 53 pages)
Ref. No. HCRC/RP-92 Price: UKL 1.80
David McKelvie, Chris Brew & Henry Thompson
Using SGML as a Basis for Data-Intensive NLP
This paper describes the LTNSL system (McKelvie96), an
architecture for writing corpus processing tools. This system is
then compared with two other systems which address similar
issues, the GATE system from Sheffield and the IMS Corpus Workbench.
In particular we address the advantages and
disadvantages of an SGML approach compared with a non-SGML
database approach.
(September 1997: 23 pages)
Ref. No. HCRC/RP-93 Price: UKL 1.10