Dialogue Assisted Visual Environment for GeoInformation: A Scenario


Providing natural ways for humans to access the vast amounts of geospatial information now available, and to make effective joint use of that information, is a challenging problem. When people communicate information needs to current GIS, the communication is often misinterpreted, with system responses not matching user expectations. This can be due to limited user knowledge of the formal language through which to specify queries, ambiguous specification of spatial context, and mismatches between the semantic framework of knowledge used by humans and that encoded in the GIs In addition, the role of current systems in mediating a dialogue between human collaborators is limited to reacting to the independent actions of each user (with reactions usually limited to changing the visual display). An iterative dialogue is needed to build links among the semantic frameworks used by the environment and by different human collaborators. Here we sketch a scenario using the kind of dialogue-assisted system we envision:


The setting: Imagine the command room of a state emergency management center in which Jane Smith, Center Director, and Paul Brown, chief transportation engineer, are in front of a large-screen display linked to the state emergency management GIs They are discussing preparations for an approaching hurricane, focusing on predictions of potential flooding and on evacuation plans, given different possible storm tracks and different evacuation routes.


The three-way dialogue below, presents a small portion of what might take place in the situation outlined above, given the natural interface that our research will develop. In this scenario, DAVE_G is the multimodal computer system (Dialogue Assisted Visual Environment for Geoinformation).


Jane: "Let's look at the population distribution here (gesture circling region of interest) in the southeast."

DAVE_G: "Is this the region you mean, Jane?" (outline appears and blinks twice)

Jane: "No, I'm not interested in that area to the west of the Interstate." (gesture indicating a general direction to the north and west)

DAVE_G: (the outline adjusts, shifting the center of focus)

Jane: "That's better."

DAVE_G: (the map zooms in, and an inset appears in the corner)

Jane:… "This section (gesture pointing generally at an area of high population density, signified by a dark purple color) looks like one of our real trouble spots."


DAVE_G: (boundaries of census districts are highlighted) "These districts? - there are 34,351 residents here, 48% having annual household incomes less than $35,000."

Jane: "yes … that's a lot of people to move and shelter. What would the flooding be like … if the storm follows this track?" (gestures at the inset map to indicate possible track)

DAVE_G: (an animation slowly spreads simulated flood waters across the map)

Jane: "Wow … how about if it passes to the south." (gestures to delineate another possible track)

DAVE_G: (an animation, again, spreads simulated flood waters across the map - leaving a shadow of the previous simulation in place) "This one results in flooding of 43% less territory and 37% fewer properties."



Jane:
(turns to Paul) What's the typical traffic density on Route 17 and these two parallel routes (turns back to map, makes gesture pointing first to one, then to the other road) into it? It looks like a potential bottleneck."

Paul: "Let's find out." (looking at the display) "DAVE, let's see the weekday transportation model and the standard traffic patterns first."

DAVE_G:
"OK, here it is." (an animation starts in which the width of the highway symbol changes with the ebb and flow of traffic throughout the day)

Paul:
"Now, we will close down this road (pointing to one of the two side roads), add the people who are typically home during the day and will evacuate, and see what happens."

DAVE_G:
(the resulting animation runs)

Addressing this kind of situation, using today's GIS, is a non-trivial, unnatural task that is prone to error. The task is not natural in two senses: (1) most subtasks (conceptualized as single objectives) require multiple user actions, actions that interfere with thinking and decision making; (2) users need to map their mental model of information needs to the data model (and semantic framework) of the GIs, something that non-specialist users have little chance to do. In addition, keyboard and mouse-based interactions interfere with the human-human communication necessary for productive group work. The current best solution to these problems is to enlist a GIs technician who sits at the computer console, attempting to translate user information needs into GIs commands. The system we envision will transfer this task to the GIs

As the scenario above illustrates, a natural GIs interface has the potential to overcome problems of current GIs use by making GIs data models and processing logic transparent to the user. To accomplish this, the system will interpret directly the user's multimodal inputs within the spatial and thematic constraints provided by the GIs and provide users with multimodal feedback in the form of speech and graphics. Doing so will increase the chance for effective application of geospatial information in time-critical, life threatening situations such as the one sketched above but also in situations such as public information access and input to decisions. For the latter, the average citizen is currently blocked from taking advantage of vast stores of geospatial information by interfaces that require too much time to learn and too much prior computing expertise to use effectively. By enabling such access and use, the research supports goals identified for Public Participation GIs (PPGIS).

An important feature of a natural human-computer interface, as illustrated above, would be the absence of predefined speech and gesture commands. The resulting bimodal speech/gesture "language" would be interpreted by the computer through two-way dialogue with the human user. While much progress has been made in the natural language processing of speech, there has been very little progress in the understanding of multimodal speech/gesture human-computer interaction. Most researchers working on gesture-enabled interfaces have used some device (such as a instrumented glove or a pen) for incorporating gestures, leading to unnaturalness. The proposed solution is to use computer vision techniques for tracking and recognizing free hand gestures and for determining gaze. The research proposed will use the previously developed single-user, large format iMAP system as an initial test-bed for user studies. The resulting analysis and system evolution will then lead to a GIs-enhanced test-bed (DAVE_G). We will apply a human-centered approach that considers user needs and user tasks from initial design through development and deployment.

The central objective of this research is to develop a theory of multimodal dialogue that provides a framework from which to achieve two results: (a) to develop, implement, test, and refine natural, human-centered interfaces to complex information systems that facilitate a two-way human-computer dialogue and that support human-human dialogue mediated by the computer; and (more specifically) (b) to develop, implement, test, and refine methods for engaging the modalities of speech, hand gesture, and gaze to support such dialogues with, and mediated by, an interactive dynamic map in the context of a GIs

This material is based upon work supported by the National Science Foundation under Grant No. BCS-0113030

web site hosted by the PSU GeoVISTA Center
Link to the Penn State Homepage