|
Qualitative geographic references in text (and audio) corpora from
field reports, audio transcriptions, human generated route directions,
weblogs, and other sources provide potentially important information
about movement of entities (people, vehicles, weapons, etc) and
about their underlying spatial behaviors. These qualitative geographic
references can only be interpreted if put in appropriate context.
Human analysts are often able to interpret even vague and imprecise
geographic references by inferring the correct context. They also
reduce ambiguity and increase interpretation success by making use
of more precise artifacts such as maps and images. But there are
orders of magnitude more potentially relevant text sources than
there are analysts available to extract the explicit and implicit
geographic references manually. While progress has been made on
geographic information retrieval from text, the progress has been
relatively slow and has focused primarily on extracting and disambiguating
place names. While that is an important and sometimes hard task
(e.g., there are 1042 instances of the name "Columbia"
in the Geographic Names Information System), place name extraction
is just a small part of the challenge.
Natural-language processing is a hard problem that is best tackled
by focusing on a bounded topic domain. A focus on linguistic accounts
of movement makes the problem tractable, but still challenging.
A key to the solution, we believe, is to not treat the problem as
a natural language processing (NLP) problem in isolation. That perspective
might focus on algorithms to extract and interpret movement references,
then match these to existing geographic data in order to generate
visual representations on maps. Instead, we propose a more comprehensive
and integrated approach in which information contained in geographic
databases and other sources is used iteratively together with natural
language processing methods to interpret geographic references in
context. In addition, for this approach to work, strategies for
formalizing the representation of geographic concepts about movement
are needed that are general enough to support integration of linguistic
and other forms of geographic information (e.g., that stored in
traditional geographic databases, extracted from images, accessible
through web feature services, available in phone books, etc.).
The proposed research focuses on leveraging the research team's
complementary past and ongoing research to address two specific
goals:
- Build the conceptual/ theoretical/ data model framework needed
to represent, extract, map, and interpret geographic accounts
of movement found in text.
- Apply the framework to creating methods, tools, and a geovisual
analytics workspace to accomplish these goals.
More specifically, we propose to address the challenge of exploiting
accounts of movement in text through a combination of
- formalizing spatial movement conceptual primitives
- extending natural language processing methods (including machine
learning)
- developing strategies for using structured geographic information
(in map and image databases, phone books, and other sources) to
provide a contextual framework for interpretation
- developing interfaces to support human analyst input to the
process and analytical use of results
|