forthcoming in the Proceedings of the 18th International Cartographic Conference,
Stockholm, Sweden, June 23-27, 1997

VISUALIZING SPATIAL RELATIONSHIPS AMONG HEALTH, ENVIRONMENTAL, AND DEMOGRAPHIC STATISTICS: INTERFACE DESIGN ISSUES


Alan M. MacEachren,(1) Colin Polsky,(1) Daniel Haug,(1) David Brown,(1) Frank Boscoe,(1) Jaishree Beedasy,(1) Linda Pickle,(2) Mark Marrara(3)


1. Department of Geography, 302 Walker, Penn State University, University Park, PA 16802; e-mail (MacEachren): alan@essc.psu.edu

2. National Center for Health Statistics, 6525 Belcrest Road, Hyattsville, MD 20782, e-mail: lwp0@nch09a.em.cdc.gov

3. Department of Psychology, Penn State University, University Park, PA 16802


Abstract

Mapping of georeferenced health statistics has, in the past, led to insights concerning various health-environment-behavior interactions. Insights have derived from the identification of clusters of deaths on static maps (Mason et al., 1975; Pickle et al., 1987; Pickle et al., 1990) followed by comparison of the cluster locations to the mapped distribution of potential etiologic agents (Croner et al., 1992). Spatial associations identified have prompted hypotheses about the causal relations, some of which have been verified. Examples include identification of "hot spots" of esophagal cancer in China and oral cancer in the U.S. state of North Carolina (Winn et al., 1981). Static paper maps, while somewhat successful in prompting epidemiological hypotheses, impose constraints on exploration of spatial characteristics of health-environment-behavior interactions.

Dynamic visualization methods offer the potential to dramatically extend the role of maps in health analysis. This paper reports on the design and implementation of a prototype dynamic interface to georeferenced health, environmental, and demographic data, with the prototype sponsored by and developed for the U.S. National Center for health Statistics, Centers for Disease Control and Prevention (NCHS). Two specific objectives have been delineated for the initial prototype: (1) to design alternative methods for displaying dynamic maps of death rate and risk factor data in a user-friendly computer system, and (2) to test these designs in an experiment where users attempt to draw inferences about changing death rate patterns and their relationship to risk factor patterns.

Background

While the primary objectives of the research are directed to particular issues of health data analysis, the representation and interface problems are more general ones associated with a range of quantitative spatio-temporal data. In designing a system to allow analysts to interact with data of the sort identified, we draw particularly upon two areas of research: (1) map animation (and its use for representing quantities aggregated to contiguous geographic units), and (2) representation methods for multivariate display. Each is represented by a growing body of research within cartography/geography and complementary research efforts in a range of disciplines from which Geographic Visualization (GVis) developers have borrowed ideas and approaches. A brief overview of key developments in each area is provided below.

Map Animation

Although animated cartography has been the focus of sporadic attention within the field since the late 1950s (Campbell, 1990), it is microcomputer-based animation of the past decade that has stimulated a focused effort to address the fundamental questions raised by a shift from static to dynamic maps. The research literature has expanded quickly and here we emphasize that research directed to animated mapping of enumerated data (a term used here to include derived quantities, such as death rates, calculcated for counties or other contiguous units). This research can be grouped into time series and non-time series animation.

Depicting a time series is, perhaps, the most obvious use of map animation. Animations of the disease AIDS by Gould and his colleagues (1990) are prototypical of time series animation applied to enumerated data. These animations have captured attention within and beyond cartography ‹ with coverage in Time, Playboy, and other non-academic publications. In addition to demonstrating the power of animation as a rhetorical tool, these animations raise a variety of conceptual issues related to symbolization, data classification, and color schemes (MacEachren and DiBiase, 1991). In relation to symbolization, MacEachren and DiBiase advocate a hybrid method (chorodot maps) as more appropriate than either choropleth or isopleth maps. More generally, Dorling (1992) has argued against choropleth maps (either static or animated time series) for demographic statistics of any kind. Instead, he advocates population cartograms as a base to which choropleth-style depiction of other variables can be added. Whatever the symbolization form used, time series animations of classified enumerated data are often choppy and discontinuous. This problem is exacerbated if classification is applied independently to each time step (Monmonier, 1994).

With non-temporal animation of enumerated data, display order is matched to a non-temporal ordered attribute. Three sub-categories are apparent. The first uses order to signify level of attribute generalization, with the map sequentially built up from one class through several classes (Peterson, 1995). A second sub-category matches scene order to order of categories on a map having a fixed number of classes, thus highlighting spatial locations having similar attribute values (Slocum et al., 1990; Monmonier, 1992). The third sub-category involves reordering maps from a time series according to some attribute of each time period, what DiBiase, et al., (1992) term "reexpression." One example that they describe involves reordering of a presidential election time series based on the magnitude of landslide voting percentages.

For any animation of enumerated data, an additional consideration is the use of exploratory data analysis (EDA) methods of linked views (two or more views that change together) and focusing (highlighting subsets of data). Monmonier (1992), for example, has generated both attribute and time series choropleth animations linked to corresponding histograms and has implemented temporal brushes that aggregate data across time periods. Both linking and brushing are most useful in a dynamic multivariate environment and are, therefore, discussed in more detail below.

Multivariate Representation

Like animation, multivariate representation has a long history in cartography prior to the advent of dynamic computer systems (or computers in general). DiBiase et al. (1994) provide a comprehensive review of multivariate representation methods emphasizing those for three or more variables. Here, as above, we restrict attention to representation of quantitative data aggregated to enumeration units.

Several authors have addressed color schemes for multivariate maps. Trumbo (1981) proposes (but does not test) color guidelines designed to highlight positive associations or to make variables separable. Building from empirical evaluation of U.S. Census bivariate color schemes by Olson (1981), Brewer (1994) has developed a more comprehensive color syntactics (system of color logic) for multivariate maps in which she distinguishes among several kinds of bivariate situations. Focusing specifically on correlation between variables, Eyton (1984) proposes a unique solution to bivariate data classification and a matching color scheme. The scheme uses a white-grey-black continuum to depict values along the regression line (that describes the relationship between variables) and complementary colors (red and cyan) to signify outliers. This approach appears to be very effective in displaying positive bivariate relationships. The most widely studied alternative to color for depicting bivariate enumerated data is texture (in the form of cross-line shading). Carstensen has demonstrated that crossed-line shading can be as effective as color (Carstensen, 1984) and that this representation method facilitates hypothesis generation about the relationships between variables mapped (Carstensen, 1986).

An important cognitive-perceptual issue that should be considered when choosing symbols for a multivariate map (whether color, texture, or other visual variables are used) involves a distinction between visually integral and visually separable visual dimensions (Shortridge, 1982). Integral variables (sign-vehicle components) are seen as wholes rather than as independent components ‹ making selective attention to their individual components difficult. Separable variables, on the other hand, are seen individually ‹ making divided attention (a focus on the conjunction of sign-vehicles as a whole) difficult. MacEachren and Brewer (1995) compared a coincident visually separable display (a bivariate map in which color attributes represented one variable and a texture overlay represented reliability of those data) with a coincident visually integral display (a similar map in which both data and reliability were depicted by attributes of color). The visually separable display proved effective in allowing the map users to recognize unreliable data without impeding their ability to notice clusters and characterize patterns in the data. A coincident visually integral depiction made it difficult for users to consider data and reliability independently.

Many dynamic graphical approaches to exploring multivariate data have their roots in EDA concepts of linking and focusing (Becker et al., 1988; Buja et al., 1991). Linking can occur in time (when views that are adjacent in display time share some attribute that provides the conceptual "link"). Linking is also used to relate simultaneously visible views (the first application being with a matrix of scatterplots). With simultaneous views, linking is usually combined with the EDA focusing technique of "brushing," where interactively selecting data in one view results in automatic selection of the corresponding data locations in all other views (Becker and Cleveland, 1987; Carr et al., 1987). Linking has been adapted to cartography in the form of scatterplot-to-map links (Monmonier, 1989), links among maps, scatterplots, and temporal legends (Monmonier, 1990) and links between standard maps and cartograms (Dykes, in press).

In addition to linked brushing, the EDA concept of focusing extends to single-view manipulations of data segment highlighting, such as dynamic data classification (Ferreia and Wiggins, 1990). A related kind of focusing is implemented in Calico, a system for dynamically adjusting the balance between two variables displayed on a bivariate map (Rheingans and Tebbs, 1990).

Prototype Development

The research highlighted above is directed primarily to representational and cognitive issues of implementing mapping/GVis systems. Designing and building a system also requires attention to issues prior to implementation. System and interface design should be approached at multiple levels in an effort to prevent particular hardware and software characteristics from dictating system goals or what a system is actually designed to do. A typical multi-level strategy is to direct separate attention to what the system is for, what it needs to be able to do, and how it works. Howard and MacEachren (1996) review literature relevant to these issues and propose a hierarchical approach to GVis system and interface design. Their approach has been used to guide our prototype design process. The stages of this approach are defined specifically as conceptual (the level at which what and who the system is for are considered), operational (the level at which conceptual goals are sub-divided into a set of discrete operations applicable to the data), and implementational (the level at which methods for achieving the operational goals are addressed ‹ within particular, hardware, software, and problem context constraints).

As noted above, the research contract that prompted this project defined quite narrow objectives related to design and testing of an environment for display of time series mortality data and associated risk factors. The broader objective of research on mapping sponsored by the NCHS, however, is to facilitate incorporation of geographically referenced representations at various stages of health research. Thus, we approach the task of identifying conceptual level goals and associated operational level "operations" with these broader objectives in mind. This results in a prototype capable not only of meeting the specified goals but also of evolving to address anticipated demands of future analysts more aware of dynamic mapping's potential for facilitating visual thinking.

Conceptual level

At the conceptual level, several categories of goals can be defined for GVis. These range from domain-independent use goals, through domain-specific reasons for a system, to narrow task objectives within a particular application context. In relation to GVis use, MacEachren and Kraak (in press), building on earlier work in statistics by Tukey (1980) and in geography by DiBiase (1990), propose four goals: exploration, analysis, synthesis, and presentation. The project reported here emphasizes exploratory visualization. Within the project's application domain of health statistics analysis, the most general goal is one that underlies all uses of mapping and GIS for health data analysis ‹ to understand the spatially varying factors that lead to mortality and disease and the variation in those factors for different at-risk groups in the population. Together this domain goal plus the emphasis on exploratory stages of research leads to a practical goal for the environment being designed ‹ to develop dynamic GVis methods and tools that enhance the ability of health/statistics specialists to recognize (and draw inferences about) mortality rate patterns, risk factor patterns, relations between risk factors and mortality, and change in both mortality and risk factors (and their relations) over time.

Within these general goals, a set of application-specific sub-goals can be identified that relate to specific aspects of exploratory analysis. At this stage, these conceptual level sub-goals can be characterized as ones that (a) emphasize spatial pattern analysis (for points in time), (b) facilitate an understanding of spatio-temporal processes, (c) support analysis at multiple spatial, attribute, and temporal scales and (d) address the implications of data characteristics and data processing methods for apparent spatial and spatio-temporal patterns. This "typology" of conceptual goals is a tentative one that we expect to modify and expand as the project develops.

In relation to spatial pattern analysis, the key goals identified thus far are: (1) identify "hot spots" (clusters in geographic space) of mortality and sort real from false hot spots; (2) facilitate the search for relationships between mortality clusters and potential risk factors. For spatio-temporal analysis, goals identified include: (1) explore spatial diffusion of mortality (due to various causes, and for various at-risk groups); (2) facilitate the search for change in geographic co-variation (between mortality and risk factors) over time. Conceptual goals dealing with "scale" relate to aggregation and disaggregation of attribute, geographic, and temporal aspects of information, and include: (1) facilitate exploration of data as a whole as well as data parsed into constituent groups (by gender, age, race, etc.); (2) facilitate multiresolution analysis (spatially and temporally). In an effort to minimize visualization errors (seeing false patterns and missing real patterns), we have identified a set of conceptual goals addressing the data characteristics and the methods for representing these data, as well as the background of the users for whom the system is being designed. These include: (1) build upon specialized expertise of potential users (e.g., in statistical analysis) to introduce methods for exploratory geo-visualization with which they may be unfamiliar; (2) facilitate an understanding of data reliability.

Operational level

As noted above, achieving a specific conceptual level goal generally requires subdividing the goal into a series of sub-goals, each of which can be met by applying a particular operation to particular information. The task at the operational level of system/interface analysis is to identify these operations. While it is never possible to completely disentangle operations as concepts from their possible implementation given available tools, the intent at this stage is to determine what procedures should be available for application to information, not how to implement them. Following from the categories of conceptual goals delineated above, operations can be grouped into those related to:

1. spatial pattern analysis and comparison (dynamic data classification, manipulation of the "mapping" between data categories and their visual representation, overlay, correlation, linked brushing of attribute and geographic displays, zooming)

2. spatio-temporal analysis (time series generation, sequential display of time steps, animation, temporal brushing, change and rate-of-change representation).

3. aggregation, disaggregation (hierarchical aggregation of geographic units, temporal aggregation, attribute disaggregation into constituent groups, spatio temporal smoothing),

4. metadata and methods representation (representing data reliability, representing the implications of other operators ‹ such as those for dynamic classification)

As above, this "typology" of operations is a tentative one that is intended simply to facilitate discussion of the operations that have been identified for initial implementation. As the process of building/refining the prototype discussed here evolves, we are concurrently working toward delineation of a more comprehensive typology of spatio-temporal operations.

the section below has been updated since the ICA Proceedings were published -- for the latest details, click here

Implementational level

At the implementational level of prototype development, our first step was to examine existing software environments to determine whether any one environment was suitable for prototype design and testing (and perhaps for future system development beyond the prototype). The full range of conceptual and operational goals identified above played a role in selecting a development environment. The most important criteria, however, were the initial client (NCHS) objectives of (a) designing alternative methods for displaying dynamic interactive maps of death rate and risk factor data and (b) testing these designs in an experiment where users attempt to draw inferences about changing death rate patterns and their relationship to risk factor patterns. Thus the goals of the prototype were less comprehensive and somewhat different than for the full system the prototype and its testing are expected to produce. The need was for an environment that facilitated rapid prototyping and experimental testing of the representation and interface options developed, not necessarily an environment that was ideal for full system implementation.

Characteristics of the NCHS data were also a factor in selecting our rapid prototyp ing environment. Data available for the project consist primarily of mortality rates and related demographic statistics aggregated to the 798 Health Service Areas (HSAs) for the conterminous U.S. The mortality rates are available for five time steps. The geographic component of the data (HSA boundaries) was provided in a standard desktop GIS format (ArcView shapefiles).

A review of several potential development environments found none that would allow us to implement easily all of the operations identified thus far and allow us to alter quickly the form in which operations were implemented. As a result, we opted for a development environment that combines ArcView (with its Avenue scripting language) and Macromedia Director (with its Lingo scripting language). Operations associated with pattern analysis and pattern comparison (of multiple variables at one time) are being implemented in ArcView. Operations associated with spatio-temporal analysis are being implemented in Director. At this stage, we have not yet implemented operations dealing with either aggregation/disaggregation or the representation of metadata and methods.

We are implementing several exploratory spatial data analysis operation within ArcView. Since relationships between mortality rates and potential risk factors (both reported as aggregate rates, or other derived measures, per HSA) are of primary importance, linked geographic brushing has been implemented. For each time step, users can highlight any points of a scatter plot to determine their location in geographic space or highlight any HSAs on a map to determine their location in bivariate (or multivariate) attribute space. As a complement to these linked views, users can select among various bivariate map depictions of the two variables displayed on any scatter plot. These depictions include representation forms designed to yield visually integral representation and others designed to yield visually separable representation (MacEachren et al., 1995). In addition, the scatterplot can be used as a dynamic legend for the maps, allowing users to interactively manipulate data category breaks. We plan to also implement a dynamic version of Eyton's (1984) equiprobability ellipse legend to facilitate the exploration of patterns of data anomalies.

Aspects of the interface that require animation are being prototyped in Director. Map sets that include mortality rates at each of five time periods along with various potential risk factors are being generated in ArcView and exported as PICT files for import as Director Cast members. In Director, an interface template has been designed that includes a map window with a temporal legend/control widget (see (Kraak et al., 1997) for discussion of temporal legends). Controls being assessed include simple start-stop buttons that control whether a time series animation is running or not, an animation pace control, a frame-by-frame advance and reverse control, push buttons that access particular time slices directly, and a temporal brush that allows users to scroll through time slices. In addition to the map window and temporal controls, additional controls are included in the basic display template for selecting the mortality cause to be displayed and the suspected risk factor to compare with it. There is also a control area through which users can manipulate whether or not the risk factor is visible and (if visible) manipulate risk factor thresholds dynamically. This focusing tool allows a user to specify that only HSAs with a value in the top X% for that risk factor (by 10% increments) will have the risk factor depicted. Various control styles implementing this tool are being compared.

Summary

The research described here is continuing. Using the three-level approach to system design outlined above, we have established system/interface goals, identified operations on data that facilitate addressing those goals, and are in the process of implementing those operations in a prototype. Once implemented, we plan to use the prototype to experimentally investigate several issues related to representation of multivariate spatio-temporal information. In particular, we will address issues related to dynamic classification, linked brushing, direct manipulation of color schemes for bivariate maps, smoothness of animation, user control of animation direction and pacing, and the depiction of relationships between two georeferenced variables as they both change over time.

Acknowledgment & author order

The research reported here is supported by a contract from the U.S. National Center for Health Statistics (DHHS, OASH, DAM #9630348), for which MacEachren is the Pricipal Investigator. Support from the NCHS is gratefully acknowledged. Contributions of the remaining authors are equal and names are in arbitrary alphabetical order (by first name).

References

Becker, R.A. and Cleveland, W.S. (1987). Brushing Scatterplots. Technometrics, 29: 127 142.

Becker, R.A., Cleveland, W.S. and Wilks, A.R. (1988). Dynamic graphics for data analysis. Dynamic Graphics for Statistics. W. S. C. a. M. E. McGill ed. 331-350.

Brewer, C.A. (1994). Color use guidelines for mapping and visualization. Visualization in Modern Cartography. Oxford, UK, Pergamon. A. M. MacEachren and D. R. F. Taylor ed. 123-147.

Buja, A., McDonald, J.A., Michalak, J. and Stuetzle, W. (1991). Interactive data visualization using focusing and linking. Proceedings, Visualization '91, San Diego, CA, IEEE. 156-163.

Campbell, C.S. and Egbert, S.L. (1990). Animated cartography: Thirty years of scratching the surface. Cartographica, 27(2): 24-46.

Carr, D.B., Littlefield, R.J., Nicholson, W.L. and Littlefield, J.S. (1987). Scatterplot matrix techniques for large N. Journal of the American Stiatistical Association, 82(398): 424 436.

Carstensen, L.W. (1984). Perceptions of variable similarity on bivariate choropleth maps. Cartographic Journal, 21: 23-29.

Carstensen, L.W. (1986). Hypothesis testing using univariate and bivariate choropleth maps. American Cartographer, 13(3): 231-251.

Croner, C.M., Pickle, L.W., Wolf, D.R. and White, A.A. (1992). A GIS approach to hypothesis generation in epidemiology. ASPRS/ACSM/RT '92, Washington, D. C., August 3-8, 1992, ASPRS/ACSM. 275-283.

DiBiase, D. (1990). Visualization in the earth sciences. Earth and Mineral Sciences, Bulletin of the College of Earth and Mineral Sciences, Penn State Univ., 59(2): 13-18.

DiBiase, D., MacEachren, A.M., Krygier, J.B. and Reeves, C. (1992). Animation and the role of map design in scientific visualization. Cartography & GIS, 19(4): 201-214.

DiBiase, D., Reeves, C., Krygier, J., MacEachren, A.M., von Wyss, M., Sloan, J. and Detweiller, M. (1994). Multivariate display of geographic data: Applications in earth system science. Visualization in Modern Cartography. Oxford, UK, Pergamon. A. M. MacEachren and D. R. F. Taylor ed. 287-312.

Dorling, D. (1992). Stretching space and splicing time: From cartographic animation to interactive visualization. Cartography & GIS, 19(4): 215-227.

Dykes, J. (in press). Computers & Geosciences.

Eyton, J.R. (1984). Complementary-color two-variable maps. Annals of the Association of American Geographers, 74: 477-490.

Ferreia, J. and Wiggins, L. (1990). The density dial: A visualization tool for thematic mapping. GeoInfo Systems, 1: 69-71.

Gould, P., DiBiase, D. and Kabel, J. (1990). Le SIDA: la carte animee comme rhetorique cartographique appliquee. Mappe Monde, 90(1): 21-26.

Howard, D. and MacEachren, A.M. (1996). Interface design for geographic visualization: Tools for representing reliability. Cartography & GIS, 23(2): 59-77.

Kraak, M.-J., Edsall, R. and MacEachren, A.M. (1997). Cartographic animation and legends for temporal maps: Exploration and or interaction. 18th International Cartographic Conference, Stockholm, June 23-27, 1997, ICA.

MacEachren, A.M. and Brewer, C.A. (1995). Reliability Representation for the NCHS Mortality Atlas. Hyattsville, MD, National Center forHealth Statistics. Project No. RM91.2

MacEachren, A.M., Brewer, C.A. and Pickle, L. (1995). Mapping health statistics: Representing data reliability. Proceedings of the 17th International Cartographic Conference, Barcelona, Spain, September 3-9, International Cartographic Association.

MacEachren, A. M. and DiBiase, D. W. (1991). Animated maps of aggregate data: Conceptual and practical problems. Cartography and Geographic Information Systems, 18(4): 221 229.

MacEachren, A.M. and Kraak, M.-J. (in press). Exploratory cartographic visualization: Advancing the agenda. Computers and Geosciences.

Mason, T.J., McKay, F.W., Hoover, R., Blot, W.J. and Fraumeni, J.F.J. (1975). Atlas of Cancer Mortality for U. S. Counties: 1950-1969. Washington, D. C., USGPO.

Monmonier, M. (1989). Geographic brushing: Enhancing exploratory analysis of the scatterplot matrix. Geographical Analysis, 21(1): 81-84.

Monmonier, M. (1990). Strategies for the visualization of geographic time-series data. Cartographica, 27(1): 30-45.

Monmonier, M. (1992). Authoring Graphics Scripts: Experiences and Principles. Cartography & GIS, 19(4): 247-260.

Monmonier, M. (1994). Minimum-change categories for dynamic temporal choropleth maps. Journal of the Pennsylvania Academy of Science, 68(1): 42-47.

Olson, J.M. (1981). Spectrally encoded two-variable maps. Annals of the Association of American Geographers, 71(2): 259-276.

Peterson, M.P. (1995). Interactive and Animated Cartography. Englewood Cliffs, NJ: Prentice Hall,

Pickle, L.W., Mason, T.J., Howard, N., Hoover, R. and Fraumeni, J.F.J. (1987). Atlas of U. S. Cancer Mortality among Whites: 1950-1980. Washington, D. C., USGPO.

Pickle, L.W., Mason, T.J., Howard, N., Hoover, R. and Fraumeni, J.F.J. (1990). Atlas of U. S. Cancer Mortality among nonwhites: 1950-1980. Washington, D. C., USGPO.

Rheingans, P. and Tebbs, B. (1990). A tool for dynamic explorations of color mappings. Computer Graphics, 24(2): 145-146.

Shortridge, B.G. (1982). Stimulus processing models from psychology: can we use them in cartography? American Cartographer, 9: 155-167.

Slocum, T.A., Roberson, S.H. and Egbert, S.L. (1990). Traditional versus sequenced choropleth maps: An experimental investigation. Cartographica, 27(1): 67-88.

Trumbo, B.E. (1981). A theory for coloring bivariate statistical maps. The American Statistician, 35(4): 220-226.

Tukey, J.W. (1980). We need both exploratory and confirmatory. The American Statistician, 34(1): 23-25.

Winn, D.M., Blot, W.J., Shy, C.M., Pickle, L.W., Roledo, A. and Fraumeni, J.F.J. (1981). Snuff dipping and oral cancer among women in the souterhn United States. New England Journal of Medicine, 304: 745-749.


Alan M. MacEachren
302 Walker
Dept. of Geography
The Pennsylvania State University
University Park, PA 16802
E-Mail: alan@essc.psu.edu
fax at: (814) 863-7943


Go back to beginning