.. _intro:
************
Introduction
************
The Edinburgh Geoparser is a language processing tool designed to
detect placename references in English text and ground them against an
authoritative gazetteer so that they can be plotted on a map. The two
main processes involved are entity recognition, to find the placename
mentions and categorise them as such, followed by a ranking process
that selects the likeliest location for each place from what may be a
long list of candidates.
The :ref:`quickstart` explains how to install the software and start
using it, and there are some worked examples of how to use it, with
illustrations of the output produced, in the :ref:`examples` chapter.
The Edinburgh geoparser was developed by Claire Grover and Richard Tobin, of the
Language Technology Group (LTG) in the School of Informatics at
Edinburgh University. Over a number of years they and other colleagues
from the LTG have refined and added to the geoparser's
functionality. :ref:`appPubs` contains a list of some published papers
evaluating the geoparser's performance relative to other similar
systems, and discussing how it has been used by the LTG and our
partners in various projects.
Like many linguistic tools of this kind, the geoparser software is
designed to work in a "pipeline", where the output of one process
forms the input for the next. This construction gives flexibility and
makes it relatively easy to switch components in and out - so if you
prefer your own tokeniser to ours, say, it is easy to make the
substitution. :ref:`pipeline` chapter explains the two steps,
geotagging to find the placenames, and georesolution to ground them in
space. See the :ref:`geotag` section for details on changing the
linguistic components. The :ref:`overview` chapter contains flowcharts
and diagrams of how the whole pipeline fits together.
The geoparser is configured to work with a number of different
gazetteers, as explained in the :ref:`gaz` chapter. Although primarily
designed to detect and geo-locate spatial references, the pipeline has
evolved to find and categorise other entity categories, *viz* person,
organisation and time expressions, as well as location. A range of
visualisation files can be produced, including a display that shows
all entity categories and which locations placenames are grounded to on a map.
The geoparser works best with fairly short texts (up to a few pages),
for reasons that are explained in the :ref:`georesolve`
section. Therefore if you have a very large corpus to process, it's
advisable to divide it into smaller chunks.
This documentation covers the downloadable version of the Edinburgh
Geoparser, to be installed on your own local machine. There is also an
online web demo of the Edinburgh Geoparser which can be tried out using an
example input text file `here `_.
.. There is also an online version embedded in the `Edina `_ `Unlock Text `_ service, which is described in the :ref:`Unlock ` chapter.
We expect the geoparser to continue to evolve, and already have plans
for enhancements. We welcome suggestions and collaboration, so please
get in touch if you have ideas about how we should develop the
software.