1. Department of Geography, 302 Walker, Penn State University, University Park, PA 16802;
e-mail (MacEachren): alan@essc.psu.edu
2. National Center for Health Statistics, 6525 Belcrest Road, Hyattsville, MD 20782, e-mail: lwp0@nch09a.em.cdc.gov
3. Department of Psychology, Penn State University, University Park, PA 16802
Abstract
Mapping of georeferenced health statistics has, in the past, led to insights concerning various health-environment-behavior interactions. Insights have derived from the identification of clusters of deaths on static maps (Mason et al., 1975; Pickle et al., 1987; Pickle et al., 1990) followed by comparison of the cluster locations to the mapped distribution of potential etiologic agents (Croner et al., 1992). Spatial associations identified have prompted hypotheses about the causal relations, some of which have been verified. Examples include identification of "hot spots" of esophagal cancer in China and oral cancer in the U.S. state of North Carolina (Winn et al., 1981). Static paper maps, while somewhat successful in prompting epidemiological hypotheses, impose constraints on exploration of spatial characteristics of health-environment-behavior interactions.
Dynamic visualization methods offer the potential to dramatically extend the role of maps in health analysis. This paper reports on the design and implementation of a prototype dynamic interface to georeferenced health, environmental, and demographic data, with the prototype sponsored by and developed for the U.S. National Center for health Statistics, Centers for Disease Control and Prevention (NCHS). Two specific objectives have been delineated for the initial prototype: (1) to design alternative methods for displaying dynamic maps of death rate and risk factor data in a user-friendly computer system, and (2) to test these designs in an experiment where users attempt to draw inferences about changing death rate patterns and their relationship to risk factor patterns.
Background
While the primary objectives of the research are directed to particular issues of health data analysis, the representation and interface problems are more general ones associated with a range of quantitative spatio-temporal data. In designing a system to allow analysts to interact with data of the sort identified, we draw particularly upon two areas of research: (1) map animation (and its use for representing quantities aggregated to contiguous geographic units), and (2) representation methods for multivariate display. Each is represented by a growing body of research within cartography/geography and complementary research efforts in a range of disciplines from which Geographic Visualization (GVis) developers have borrowed ideas and approaches. A brief overview of key developments in each area is provided below.
Map Animation
Although animated cartography has been the focus of sporadic attention within the field since the late 1950s (Campbell, 1990), it is microcomputer-based animation of the past decade that has stimulated a focused effort to address the fundamental questions raised by a shift from static to dynamic maps. The research literature has expanded quickly and here we emphasize that research directed to animated mapping of enumerated data (a term used here to include derived quantities, such as death rates, calculcated for counties or other contiguous units). This research can be grouped into time series and non-time series animation.
Depicting a time series is, perhaps, the most obvious use of map animation. Animations of the disease AIDS by Gould and his colleagues (1990) are prototypical of time series animation applied to enumerated data. These animations have captured attention within and beyond cartography with coverage in Time, Playboy, and other non-academic publications. In addition to demonstrating the power of animation as a rhetorical tool, these animations raise a variety of conceptual issues related to symbolization, data classification, and color schemes (MacEachren and DiBiase, 1991). In relation to symbolization, MacEachren and DiBiase advocate a hybrid method (chorodot maps) as more appropriate than either choropleth or isopleth maps. More generally, Dorling (1992) has argued against choropleth maps (either static or animated time series) for demographic statistics of any kind. Instead, he advocates population cartograms as a base to which choropleth-style depiction of other variables can be added. Whatever the symbolization form used, time series animations of classified enumerated data are often choppy and discontinuous. This problem is exacerbated if classification is applied independently to each time step (Monmonier, 1994).
With non-temporal animation of enumerated data, display order is matched to a non-temporal ordered attribute. Three sub-categories are apparent. The first uses order to signify level of attribute generalization, with the map sequentially built up from one class through several classes (Peterson, 1995). A second sub-category matches scene order to order of categories on a map having a fixed number of classes, thus highlighting spatial locations having similar attribute values (Slocum et al., 1990; Monmonier, 1992). The third sub-category involves reordering maps from a time series according to some attribute of each time period, what DiBiase, et al., (1992) term "reexpression." One example that they describe involves reordering of a presidential election time series based on the magnitude of landslide voting percentages.
For any animation of enumerated data, an additional consideration is the use of exploratory data analysis (EDA) methods of linked views (two or more views that change together) and focusing (highlighting subsets of data). Monmonier (1992), for example, has generated both attribute and time series choropleth animations linked to corresponding histograms and has implemented temporal brushes that aggregate data across time periods. Both linking and brushing are most useful in a dynamic multivariate environment and are, therefore, discussed in more detail below.
Multivariate Representation
Like animation, multivariate representation has a long history in cartography prior to the advent of dynamic computer systems (or computers in general). DiBiase et al. (1994) provide a comprehensive review of multivariate representation methods emphasizing those for three or more variables. Here, as above, we restrict attention to representation of quantitative data aggregated to enumeration units.
Several authors have addressed color schemes for multivariate maps. Trumbo (1981) proposes (but does not test) color guidelines designed to highlight positive associations or to make variables separable. Building from empirical evaluation of U.S. Census bivariate color schemes by Olson (1981), Brewer (1994) has developed a more comprehensive color syntactics (system of color logic) for multivariate maps in which she distinguishes among several kinds of bivariate situations. Focusing specifically on correlation between variables, Eyton (1984) proposes a unique solution to bivariate data classification and a matching color scheme. The scheme uses a white-grey-black continuum to depict values along the regression line (that describes the relationship between variables) and complementary colors (red and cyan) to signify outliers. This approach appears to be very effective in displaying positive bivariate relationships. The most widely studied alternative to color for depicting bivariate enumerated data is texture (in the form of cross-line shading). Carstensen has demonstrated that crossed-line shading can be as effective as color (Carstensen, 1984) and that this representation method facilitates hypothesis generation about the relationships between variables mapped (Carstensen, 1986).
An important cognitive-perceptual issue that should be considered when choosing symbols for a multivariate map (whether color, texture, or other visual variables are used) involves a distinction between visually integral and visually separable visual dimensions (Shortridge, 1982). Integral variables (sign-vehicle components) are seen as wholes rather than as independent components making selective attention to their individual components difficult. Separable variables, on the other hand, are seen individually making divided attention (a focus on the conjunction of sign-vehicles as a whole) difficult. MacEachren and Brewer (1995) compared a coincident visually separable display (a bivariate map in which color attributes represented one variable and a texture overlay represented reliability of those data) with a coincident visually integral display (a similar map in which both data and reliability were depicted by attributes of color). The visually separable display proved effective in allowing the map users to recognize unreliable data without impeding their ability to notice clusters and characterize patterns in the data. A coincident visually integral depiction made it difficult for users to consider data and reliability independently.
Many dynamic graphical approaches to exploring multivariate data have their roots in EDA concepts of linking and focusing (Becker et al., 1988; Buja et al., 1991). Linking can occur in time (when views that are adjacent in display time share some attribute that provides the conceptual "link"). Linking is also used to relate simultaneously visible views (the first application being with a matrix of scatterplots). With simultaneous views, linking is usually combined with the EDA focusing technique of "brushing," where interactively selecting data in one view results in automatic selection of the corresponding data locations in all other views (Becker and Cleveland, 1987; Carr et al., 1987). Linking has been adapted to cartography in the form of scatterplot-to-map links (Monmonier, 1989), links among maps, scatterplots, and temporal legends (Monmonier, 1990) and links between standard maps and cartograms (Dykes, in press).
In addition to linked brushing, the EDA concept of focusing extends to single-view manipulations of data segment highlighting, such as dynamic data classification (Ferreia and Wiggins, 1990). A related kind of focusing is implemented in Calico, a system for dynamically adjusting the balance between two variables displayed on a bivariate map (Rheingans and Tebbs, 1990).
Prototype Development
The research highlighted above is directed primarily to representational and cognitive issues of implementing mapping/GVis systems. Designing and building a system also requires attention to issues prior to implementation. System and interface design should be approached at multiple levels in an effort to prevent particular hardware and software characteristics from dictating system goals or what a system is actually designed to do. A typical multi-level strategy is to direct separate attention to what the system is for, what it needs to be able to do, and how it works. Howard and MacEachren (1996) review literature relevant to these issues and propose a hierarchical approach to GVis system and interface design. Their approach has been used to guide our prototype design process. The stages of this approach are defined specifically as conceptual (the level at which what and who the system is for are considered), operational (the level at which conceptual goals are sub-divided into a set of discrete operations applicable to the data), and implementational (the level at which methods for achieving the operational goals are addressed within particular, hardware, software, and problem context constraints).
As noted above, the research contract that prompted this project defined quite narrow objectives related to design and testing of an environment for display of time series mortality data and associated risk factors. The broader objective of research on mapping sponsored by the NCHS, however, is to facilitate incorporation of geographically referenced representations at various stages of health research. Thus, we approach the task of identifying conceptual level goals and associated operational level "operations" with these broader objectives in mind. This results in a prototype capable not only of meeting the specified goals but also of evolving to address anticipated demands of future analysts more aware of dynamic mapping's potential for facilitating visual thinking.
Conceptual level
At the conceptual level, several categories of goals can be defined for GVis. These range from domain-independent use goals, through domain-specific reasons for a system, to narrow task objectives within a particular application context. In relation to GVis use, MacEachren and Kraak (in press), building on earlier work in statistics by Tukey (1980) and in geography by DiBiase (1990), propose four goals: exploration, analysis, synthesis, and presentation. The project reported here emphasizes exploratory visualization. Within the project's application domain of health statistics analysis, the most general goal is one that underlies all uses of mapping and GIS for health data analysis to understand the spatially varying factors that lead to mortality and disease and the variation in those factors for different at-risk groups in the population. Together this domain goal plus the emphasis on exploratory stages of research leads to a practical goal for the environment being designed to develop dynamic GVis methods and tools that enhance the ability of health/statistics specialists to recognize (and draw inferences about) mortality rate patterns, risk factor patterns, relations between risk factors and mortality, and change in both mortality and risk factors (and their relations) over time.
Within these general goals, a set of application-specific sub-goals can be identified that relate to specific aspects of exploratory analysis. At this stage, these conceptual level sub-goals can be characterized as ones that (a) emphasize spatial pattern analysis (for points in time), (b) facilitate an understanding of spatio-temporal processes, (c) support analysis at multiple spatial, attribute, and temporal scales and (d) address the implications of data characteristics and data processing methods for apparent spatial and spatio-temporal patterns. This "typology" of conceptual goals is a tentative one that we expect to modify and expand as the project develops.
In relation to spatial pattern analysis, the key goals identified thus far are: (1) identify "hot spots" (clusters in geographic space) of mortality and sort real from false hot spots; (2) facilitate the search for relationships between mortality clusters and potential risk factors. For spatio-temporal analysis, goals identified include: (1) explore spatial diffusion of mortality (due to various causes, and for various at-risk groups); (2) facilitate the search for change in geographic co-variation (between mortality and risk factors) over time. Conceptual goals dealing with "scale" relate to aggregation and disaggregation of attribute, geographic, and temporal aspects of information, and include: (1) facilitate exploration of data as a whole as well as data parsed into constituent groups (by gender, age, race, etc.); (2) facilitate multiresolution analysis (spatially and temporally). In an effort to minimize visualization errors (seeing false patterns and missing real patterns), we have identified a set of conceptual goals addressing the data characteristics and the methods for representing these data, as well as the background of the users for whom the system is being designed. These include: (1) build upon specialized expertise of potential users (e.g., in statistical analysis) to introduce methods for exploratory geo-visualization with which they may be unfamiliar; (2) facilitate an understanding of data reliability.
Operational level
As noted above, achieving a specific conceptual level goal generally requires subdividing the goal into a series of sub-goals, each of which can be met by applying a particular operation to particular information. The task at the operational level of system/interface analysis is to identify these operations. While it is never possible to completely disentangle operations as concepts from their possible implementation given available tools, the intent at this stage is to determine what procedures should be available for application to information, not how to implement them. Following from the categories of conceptual goals delineated above, operations can be grouped into those related to:
the section below has been updated since the ICA Proceedings were published -- for the latest details, click here
Implementational level
At the implementational level of prototype development, our first step was to examine existing software environments to determine whether any one environment was
suitable for prototype design and testing (and perhaps for future system development
beyond the prototype). The full range of conceptual and operational goals identified
above played a role in selecting a development environment. The most important
criteria, however, were the initial client (NCHS) objectives of (a) designing alternative
methods for displaying dynamic interactive maps of death rate and risk factor data
and (b) testing these designs in an experiment where users attempt to draw inferences about changing death rate patterns and their relationship to risk factor patterns.
Thus the goals of the prototype were less comprehensive and somewhat different
than for the full system the prototype and its testing are expected to produce. The
need was for an environment that facilitated rapid prototyping and experimental testing of the representation and interface options developed, not necessarily an environment that was ideal for full system implementation.
Characteristics of the NCHS data were also a factor in selecting our rapid prototyp
ing environment. Data available for the project consist primarily of mortality rates and
related demographic statistics aggregated to the 798 Health Service Areas (HSAs) for
the conterminous U.S. The mortality rates are available for five time steps. The geographic component of the data (HSA boundaries) was provided in a standard desktop GIS format (ArcView shapefiles).
A review of several potential development environments found none that would allow us to implement easily all of the operations identified thus far and allow us to
alter quickly the form in which operations were implemented. As a result, we opted
for a development environment that combines ArcView (with its Avenue scripting
language) and Macromedia Director (with its Lingo scripting language). Operations
associated with pattern analysis and pattern comparison (of multiple variables at one
time) are being implemented in ArcView. Operations associated with spatio-temporal
analysis are being implemented in Director. At this stage, we have not yet implemented operations dealing with either aggregation/disaggregation or the representation of
metadata and methods.
We are implementing several exploratory spatial data analysis operation within
ArcView. Since relationships between mortality rates and potential risk factors (both
reported as aggregate rates, or other derived measures, per HSA) are of primary importance, linked geographic brushing has been implemented. For each time step, users
can highlight any points of a scatter plot to determine their location in geographic
space or highlight any HSAs on a map to determine their location in bivariate (or
multivariate) attribute space. As a complement to these linked views, users can select
among various bivariate map depictions of the two variables displayed on any scatter
plot. These depictions include representation forms designed to yield visually integral
representation and others designed to yield visually separable representation
(MacEachren et al., 1995). In addition, the scatterplot can be used as a dynamic legend for the maps, allowing users to interactively manipulate data category breaks. We
plan to also implement a dynamic version of Eyton's (1984) equiprobability ellipse
legend to facilitate the exploration of patterns of data anomalies.
Aspects of the interface that require animation are being prototyped in Director. Map
sets that include mortality rates at each of five time periods along with various potential risk factors are being generated in ArcView and exported as PICT files for import
as Director Cast members. In Director, an interface template has been designed that
includes a map window with a temporal legend/control widget (see (Kraak et al.,
1997) for discussion of temporal legends). Controls being assessed include simple
start-stop buttons that control whether a time series animation is running or not, an
animation pace control, a frame-by-frame advance and reverse control, push buttons
that access particular time slices directly, and a temporal brush that allows users to
scroll through time slices. In addition to the map window and temporal controls, additional controls are included in the basic display template for selecting the mortality
cause to be displayed and the suspected risk factor to compare with it. There is also a
control area through which users can manipulate whether or not the risk factor is
visible and (if visible) manipulate risk factor thresholds dynamically. This focusing
tool allows a user to specify that only HSAs with a value in the top X% for that risk
factor (by 10% increments) will have the risk factor depicted. Various control styles
implementing this tool are being compared.
Summary
The research described here is continuing. Using the three-level approach to system
design outlined above, we have established system/interface goals, identified operations on data that facilitate addressing those goals, and are in the process of implementing those operations in a prototype. Once implemented, we plan to use the prototype to experimentally investigate several issues related to representation of
multivariate spatio-temporal information. In particular, we will address issues related
to dynamic classification, linked brushing, direct manipulation of color schemes for
bivariate maps, smoothness of animation, user control of animation direction and pacing, and the depiction of relationships between two georeferenced variables as they
both change over time.
The research reported here is supported by a contract from the U.S. National Center for Health Statistics (DHHS, OASH, DAM #9630348), for which MacEachren is the Pricipal Investigator. Support from the NCHS is gratefully acknowledged. Contributions of the remaining authors are equal and names are in arbitrary alphabetical order (by first name).
References
Alan M. MacEachren
302 Walker
Dept. of Geography
The Pennsylvania State University
University Park, PA 16802
E-Mail: alan@essc.psu.edu
fax at: (814) 863-7943