GeoCAM: Representing, extracting, mapping, and interpreting movement references in text    
 

Qualitative geographic references in text (and audio) corpora from field reports, audio transcriptions, human generated route directions, weblogs, and other sources provide potentially important information about movement of entities (people, vehicles, weapons, etc) and about their underlying spatial behaviors. These qualitative geographic references can only be interpreted if put in appropriate context. Human analysts are often able to interpret even vague and imprecise geographic references by inferring the correct context. They also reduce ambiguity and increase interpretation success by making use of more precise artifacts such as maps and images. But there are orders of magnitude more potentially relevant text sources than there are analysts available to extract the explicit and implicit geographic references manually. While progress has been made on geographic information retrieval from text, the progress has been relatively slow and has focused primarily on extracting and disambiguating place names. While that is an important and sometimes hard task (e.g., there are 1042 instances of the name "Columbia" in the Geographic Names Information System), place name extraction is just a small part of the challenge.

Natural-language processing is a hard problem that is best tackled by focusing on a bounded topic domain. A focus on linguistic accounts of movement makes the problem tractable, but still challenging. A key to the solution, we believe, is to not treat the problem as a natural language processing (NLP) problem in isolation. That perspective might focus on algorithms to extract and interpret movement references, then match these to existing geographic data in order to generate visual representations on maps. Instead, we propose a more comprehensive and integrated approach in which information contained in geographic databases and other sources is used iteratively together with natural language processing methods to interpret geographic references in context. In addition, for this approach to work, strategies for formalizing the representation of geographic concepts about movement are needed that are general enough to support integration of linguistic and other forms of geographic information (e.g., that stored in traditional geographic databases, extracted from images, accessible through web feature services, available in phone books, etc.).

The proposed research focuses on leveraging the research team's complementary past and ongoing research to address two specific goals:

  1. Build the conceptual/ theoretical/ data model framework needed to represent, extract, map, and interpret geographic accounts of movement found in text.
  2. Apply the framework to creating methods, tools, and a geovisual analytics workspace to accomplish these goals.

More specifically, we propose to address the challenge of exploiting accounts of movement in text through a combination of

  • formalizing spatial movement conceptual primitives
  • extending natural language processing methods (including machine learning)
  • developing strategies for using structured geographic information (in map and image databases, phone books, and other sources) to provide a contextual framework for interpretation
  • developing interfaces to support human analyst input to the process and analytical use of results