
Dialogue
Assisted Visual Environment for GeoInformation: A Scenario
Providing natural ways for humans to access the vast
amounts of geospatial information now available, and to make effective
joint use of that information, is a challenging problem. When people
communicate information needs to current GIS, the communication is
often misinterpreted, with system responses not matching user expectations.
This can be due to limited user knowledge of the formal language through
which to specify queries, ambiguous specification of spatial context,
and mismatches between the semantic framework of knowledge used by
humans and that encoded in the GIs In addition, the role of current
systems in mediating a dialogue between human collaborators is limited
to reacting to the independent actions of each user (with reactions
usually limited to changing the visual display). An iterative dialogue
is needed to build links among the semantic frameworks used by the
environment and by different human collaborators. Here we sketch a
scenario using the kind of dialogue-assisted system we envision:
The setting: Imagine the command room of a state emergency
management center in which Jane Smith, Center Director, and Paul
Brown, chief transportation engineer, are in front of a large-screen
display linked to the state emergency management GIs They are discussing
preparations for an approaching hurricane, focusing on predictions
of potential flooding and on evacuation plans, given different possible
storm tracks and different evacuation routes.
The three-way dialogue below, presents a small portion of what might
take place in the situation outlined above, given the natural interface
that our research will develop. In this scenario, DAVE_G is
the multimodal computer system (Dialogue Assisted Visual Environment
for Geoinformation).

Jane: "Let's
look at the population distribution here (gesture circling region
of interest) in the southeast."
DAVE_G: "Is this the region you mean, Jane?"
(outline appears and blinks twice)
Jane: "No, I'm not interested in that area to the
west of the Interstate." (gesture indicating a general direction
to the north and west)
DAVE_G: (the outline adjusts, shifting the center of focus)
Jane: "That's better."
DAVE_G: (the map zooms in, and an inset appears in the
corner)
Jane:
"This section (gesture pointing generally
at an area of high population density, signified by a dark purple
color) looks like one of our real trouble spots."
|
DAVE_G:
(boundaries of census districts are highlighted) "These districts?
- there are 34,351 residents here, 48% having annual household
incomes less than $35,000."
Jane: "yes
that's a lot of people to move and
shelter. What would the flooding be like
if the storm follows
this track?" (gestures at the inset map to indicate possible
track)
DAVE_G: (an animation slowly spreads simulated flood waters
across the map)
Jane: "Wow
how about if it passes to the south."
(gestures to delineate another possible track)
DAVE_G: (an animation, again, spreads simulated flood waters
across the map - leaving a shadow of the previous simulation in
place) "This one results in flooding of 43% less territory
and 37% fewer properties."
|

Jane: (turns to Paul) What's the typical
traffic density on Route 17 and these two parallel routes (turns
back to map, makes gesture pointing first to one, then to the
other road) into it? It looks like a potential bottleneck."
Paul: "Let's find out." (looking at the display)
"DAVE, let's see the weekday transportation model and the
standard traffic patterns first."
DAVE_G: "OK, here it is." (an animation starts in
which the width of the highway symbol changes with the ebb and
flow of traffic throughout the day)
Paul: "Now, we will close down this road (pointing to
one of the two side roads), add the people who are typically home
during the day and will evacuate, and see what happens."
DAVE_G: (the resulting animation runs)
|
Addressing this
kind of situation, using today's GIS, is a non-trivial, unnatural
task that is prone to error. The task is not natural in two senses:
(1) most subtasks (conceptualized as single objectives) require multiple
user actions, actions that interfere with thinking and decision making;
(2) users need to map their mental model of information needs to the
data model (and semantic framework) of the GIs, something that non-specialist
users have little chance to do. In addition, keyboard and mouse-based
interactions interfere with the human-human communication necessary
for productive group work. The current best solution to these problems
is to enlist a GIs technician who sits at the computer console, attempting
to translate user information needs into GIs commands. The system
we envision will transfer this task to the GIs
As the scenario
above illustrates, a natural GIs interface has the potential to overcome
problems of current GIs use by making GIs data models and processing
logic transparent to the user. To accomplish this, the system will
interpret directly the user's multimodal inputs within the spatial
and thematic constraints provided by the GIs and provide users with
multimodal feedback in the form of speech and graphics. Doing so will
increase the chance for effective application of geospatial information
in time-critical, life threatening situations such as the one sketched
above but also in situations such as public information access and
input to decisions. For the latter, the average citizen is currently
blocked from taking advantage of vast stores of geospatial information
by interfaces that require too much time to learn and too much prior
computing expertise to use effectively. By enabling such access and
use, the research supports goals identified for Public Participation
GIs (PPGIS).
An important
feature of a natural human-computer interface, as illustrated above,
would be the absence of predefined speech and gesture commands. The
resulting bimodal speech/gesture "language" would be interpreted
by the computer through two-way dialogue with the human user. While
much progress has been made in the natural language processing of
speech, there has been very little progress in the understanding of
multimodal speech/gesture human-computer interaction. Most researchers
working on gesture-enabled interfaces have used some device (such
as a instrumented glove or a pen) for incorporating gestures, leading
to unnaturalness. The proposed solution is to use computer vision
techniques for tracking and recognizing free hand gestures and for
determining gaze. The research proposed will use the previously developed
single-user, large format iMAP system as an initial test-bed for user
studies. The resulting analysis and system evolution will then lead
to a GIs-enhanced test-bed (DAVE_G). We will apply a human-centered
approach that considers user needs and user tasks from initial design
through development and deployment.
The central objective
of this research is to develop a theory of multimodal dialogue that
provides a framework from which to achieve two results: (a) to develop,
implement, test, and refine natural, human-centered interfaces to
complex information systems that facilitate a two-way human-computer
dialogue and that support human-human dialogue mediated by the computer;
and (more specifically) (b) to develop, implement, test, and refine
methods for engaging the modalities of speech, hand gesture, and gaze
to support such dialogues with, and mediated by, an interactive dynamic
map in the context of a GIs