Putting while in context
This paper gives an analysis of the connective while and the way it functions in context, based on a study of a set of while-clauses in two different text types. I propose to view while as one connective with a basic meaning and a number of different surface realizations. I show that different interpretations of while are subject to constraints coming from different directions, and that in determining how a particular interpretation arises, attributes of the sentence as well as its context need to be taken into account. In the proposed approach, syntactic structure, semantic meaning and discourse function are closely intertwined.
(May 1997: 34 pages)
Ref. No. HCRC/RP-85 Price: UKL 0.90
Renata Vieira & Massimo Poesio
Corpus-based processing of definite descriptions
To appear in Botley and McEnery eds., "Corpus-based and computational approaches to anaphora".
We discuss in this paper a system that resolves definite descriptions in written texts. A preliminary study of definite descriptions in a collection of 20 texts revealed that about 30% of the 1040 definites in the collection were cases of anaphoric definites whose antecedents had the same head noun, and 50% introduced novel discourse referents. An algorithm which resolves anaphoric definite descriptions and also identifies novel ones is proposed. We evaluated the algorithm by comparing its results with an annotation produced by human subjects. The analysis of the corpus, the implemented algorithm, and the evaluation of the results are presented in this paper.
(May 1997: 15 pages)
Ref. No. HCRC/RP-86 Price: UKL 0.66
Massimo Poesio, Renata Vieira and Simone Teufel
Resolving bridging descriptions in unrestricted texts
Our goal is to develop a system capable of treating the largest possible subset of definite descriptions in unrestricted written texts. A previous prototype resolved anaphoric uses of definite descriptions and identified some types of first-mention uses, achieving a recall of 56%. In this paper we present the latest version of our system, which handles some types of bridging references, uses WordNet as a source of lexical knowledge, and achieves a recall of 65%.
(May 1997: 6 pages)
Ref. No. HCRC/RP-87 Price: UKL 0.54
Robert Inder (University of Edinburgh), Dick Bulterman (CWI), Pedro Basagoiti (Software AG Espana), Rob Cannell (Implex Environmental Systems Limited), Mandy Haggith (University of Edinburgh), Peter Kullmann (University of Karlsruhe), Bill Ritchie (Assynt Crofters Trust), Isabel Serrasolsas (CEAM)
Sustainable Telematics for Environmental Management (STEM final report)
This is the final report from STEM, a project funded in the Environment sector of the Telematics Applications Programme of the European Union. STEM's overall aim is to provide easy-to-use telematics tools to improve the information available to land managers in making decisions. This project was a one-year study to demonstrate the feasibility of constructing a telematics application to help them access accurate and up-to-date information It was driven by the requirements of two groups of users with responsibility for environmental planning and management in remote and environmentally sensitive Objective 1 areas of Europe. Work closely followed the original project plan. Initial requirements capture confirmed the demand for a tool like STEM, and documented the required functionality. An outline architecture for an open, extensible system to meet those needs was then designed. This is based on an environment for creating and sharing sets of configurable, domain specific facilities for accessing, combining and appropriately presenting information for a particular purpose. Finally, over 50 issues affecting the feasibility of building such a system were identified and assessed. This showed that there will be technical challenges in bringing together the range of software components needed, and some formidable administrative and organisational obstacles to be overcome. Nevertheless, our overall conclusion is that it is feasible to build a working telematics system which will be a valuable and powerful tool to assist in environmental management.
(May 1997: 38 pages)
Ref. No. HCRC/RP-88 Price: UKL 0.95
Claire Hewson, Richard Cox & Keith Stenning
A study of Statistical Process Control training needs in small and medium UK companies.
This report examines the implementation of Statistical Process Control (SPC) in a sample of six UK companies. Practices, problems, and training procedures are identified and compared in order to assess training needs. Recommendations are made for effective SPC training and a prototype training package is described.
(June 1997: 58 pages)
Ref. No. HCRC/RP-89 Price: UKL 1.90
Korin Richmond, Andrew Smith & Einat Amitay
Detecting Subject Boundaries within Text: A Language Independent Statistical Approach
We describe here an algorithm for detecting subject boundaries within text, based on a statistical lexical similarity measure. Hearst has already tackled this problem with good results (Hearst, 1994). One of her main assumptions is that a change in subject is accompanied by a change in vocabulary. Using this assumption, but by introducing a new measure of word significance, we have been able to build a robust and reliable algorithm which exhibits improved accuracy without sacrificing language independency.
(June 1997: 8 pages)
Ref. No. HCRC/RP-90 Price: UKL 0.70
When Marking Tone Reduces Fluency: An Orthography Experiment in Cameroon
Should an alphabetic orthography for a tone language include tone marks? Opinion and practice are divided along three lines: zero marking, phonemic marking and various reduced marking schemes. This paper examines the success of phonemic tone marking for Dschang, a Grassfields Bantu language which uses tone to distinguish lexical items and some grammatical constructions.
Participants with a variety of ages and educational backgrounds, and having different levels of exposure to the orthography were tested on location in the Western Province of Cameroon. All but one had attended classes on tone marking. Participants read texts which were marked and unmarked for tone, then added tone marks to the unmarked texts. Analysis shows that tone marking degrades reading fluency and does not help to resolve tonally ambiguous words. Experienced writers attain an accuracy score of 83.5% in adding tone marks to a text, while inexperienced writers score a mere 53%, which is not much better than chance. A reduced tone orthography is briefly described, and early indications are that it will out-perform both tone orthographies tested here. A detailed survey of other experimental work on African tone orthography lays the groundwork for the experiment.
(June 1997: 35 pages)
Ref. No. HCRC/RP-91 Price: UKL 1.30
Massimo Poesio & David R. Traum
Conversational Actions and Discourse Situations
We use the idea that actions performed in a conversation become part of the common ground as the basis for a model of context that reconciles in a general and systematic fashion the differences between the theories of discourse context used for reference resolution, intention recognition, and dialogue management. We start from the treatment of anaphoric accessibility developed in DRT, and we show first how to obtain a discourse model that, while preserving DRT's basic ideas about referential accessibility, includes information about the occurrence of speech acts and their relations. Next, we show how the different kinds of `structure' that play a role in conversation---discourse segmentation, turn-taking, and grounding---can be formulated in terms of information about speech acts, and use this same information as the basis for a model of the interpretation of fragmentary input.
To appear in Computational Intelligence.
(September 1997: 53 pages)
Ref. No. HCRC/RP-92 Price: UKL 1.80
David McKelvie, Chris Brew & Henry Thompson
Using SGML as a Basis for Data-Intensive NLP
This paper describes the LTNSL system (McKelvie96), an architecture for writing corpus processing tools. This system is then compared with two other systems which address similar issues, the GATE system from Sheffield and the IMS Corpus Workbench. In particular we address the advantages and disadvantages of an SGML approach compared with a non-SGML database approach.
(September 1997: 23 pages)
Ref. No. HCRC/RP-93 Price: UKL 1.10