published in Cartographic Perspective, Number 13, Fall, 1992, pp. 10-19

VISUALIZING UNCERTAIN INFORMATION

Alan M. MacEachren
Department of Geography
302 Walker
The Pennsylvania State University
University Park, PA 16802
e-mail: alan@essc.psu.edu

When maps are used as visualization tools, exploration of potential relationships takes precedence over presentation of facts. In these early stages of scientific analysis or policy formulation, providing a way for analysts to assess uncertainty in the data they are exploring is critical to the perspectives they form and the approaches they decide to pursue. As a basis from which to develop methods for visualizing uncertain information, this paper addresses the difference between data quality and uncertainty, the application of Bertin's graphic variables to the representation of uncertainty, conceptual models of spatial uncertainty as they relate to kinds of cartographic symbolization, and categories of user interfaces suited to presenting data and uncertainty about that data. Also touched on is the issue of how we might evaluate our attempts to depict uncertain information on maps.
Uncertainty is a critical issue in geographic visualization due to the tendency of most people to treat both maps and computers as somehow less fallible than the humans who make decisions they are based upon. When a GIS is used to compile, analyze, and display information, the chance for unacceptable or variable data quality is high due to the merging of multiple data layers. Together with these data quality issues, the flexibility of data manipulation that makes GIS so powerful can lead to considerable uncertainty in map displays produced at various stages of GIS analysis. This paper addresses a variety of conceptual issues underlying development of visualization tools that allow analysts to take this uncertainty into account in their research and policy formulation activities.

There is a strong tradition in cartography of attention to data quality. Only rudimentary steps, however, have been made thus far to deal with the complex issues of visualizing data quality for multidimensional data displays used in image analysis and GIS applications. The importance of this topic is evinced by the decision of the National Center for Geographic Information and Analysis (NCGIA) to make visualization of data quality the first visualization initiative undertaken by the center (NCGIA, 1989).1

Kate Beard and Barbara Buttenfield (1991), presenting the NCGIA position, indicate that quality of spatial information "relates to accuracy, error, consistency, and reliability." These aspects of quality are meant to apply to more than locational verity. It is useful to begin consideration of quality issues with the framework of the Proposed Digital Cartographic Data Standard (Moellering, et. al., 1988), incorporating locational accuracy, attribute accuracy, logical consistency (i.e., a data structure whose topology makes sense), completeness (comprehensive data and systematic ways of dealing with missing values) and finally lineage.

The above quality categories are important, but to use a GIS effectively for either scientific inquiry or policy formulation, the scope must be broadened. In risk assessment circles, the term uncertainty has gained some acceptance and I suggest that we might be better off if we follow their lead (Morgan and Henrion, 1990; Rejeski and Kapuscinski, 1990). Analysts never know the precise amount of error in any particular data object -- or they would correct the error. They are more - or less - uncertain about the available characterization of particular data objects. From this perspective alone, the term uncertainty might be a better description of what the NCGIA (and many past cartographers) have been calling quality. In addition, however, uncertainty includes something of importance beyond the narrow definition of quality that the NCGIA initiative seems to be directed toward. A brief example will illustrate the difference between a focus on quality and on uncertainty and why it is the latter that should guide our efforts.

Imagine a single census block in a city. You have sent an enumerator out to take the census. In this particular case, the response rate is 90%. In data quality terms, we might say that our population and income information for this block is of less than perfect quality because of the lack of "completeness" in the data. Further, there may be "attribute inaccuracy" in the data collected due to misunderstanding of the survey questions or deliberate misinformation about items such as income or education, or "spatial inaccuracy" due to address coding errors by the census enumerator. If, in the adjacent census block we somehow achieved 100% participation in the census, everyone understood the questions and gave truthful responses, and the enumerator made no mistakes, a data quality assessment would label that unit's data as perfect. What we will be leaving out of this assessment is the issue of variability (over both space and time and within categories). This latter point is made quite forcefully by Langford and Unwin (1991) who argue that, for the mapping of most socio-economic phenomena, a choropleth map of aggregated data for enumeration units is "a poor choice" due to extreme within-unit variability that is the rule rather than the exception.

In addition to variability due to spatial aggregation, attribute aggregation adds additional variability, and therefore, uncertainty. All data are categorized. Even when individual measurements are retained in the database, categories will be implicitly defined by the mathematical precision of individual measurements. For example, temperatures might be measured to the nearest degree. Most data in a GIS, however, will be grouped into much broader categories (e.g., soil classifications, income brackets, whether a house has indoor plumbing or not, etc.). In all of these cases, the categorization introduces uncertainty even when the data are of high quality.

We can only be certain that a particular location -- a particular data object -- fits somewhere within the attribute bounds of the categories and the spatial bounds of the enumeration unit to which it is aggregated. The aggregate totals for our census blocks disguise the variability within those census blocks. Our level of uncertainty about map locations will be a function not only of the quality of values (as defined above), but of variance around the mean values we typically use to represent the unit, and of spatial variability across the unit.

In addition to spatial and attribute data quality and variability, a final uncertainty to be dealt with is temporal. The data, even if accurate and homogeneous, represent a snapshot at one point in time. Our uncertainty about their veracity will increase due to uncertainty about temporal information, resolution with which that information is specified, and the difference in time between data collection and data use. The temporally induced uncertainty will vary with kind of phenomena being represented.

When we use a GIS, the important issue is quality of the decisions we make -- about a research course to follow, an urban development policy to impose, or an environmental regulation to enforce. Whether we use the term data quality or data uncertainty matters less than whether the tool we give the GIS user is adequate for deciding how much faith to put in any particular piece of information extracted from the database. We can have highly accurate data while still having imprecise data. This lack of precision is at least as important an issue as a lack of accuracy. Precision here refers, not only to the specificity of data values in terms of significant digits, but in a more general sense to "the degree of refinement with which an operation is performed or a measurement taken" (Webster's New Collegiate Dictionary, 1974). In this sense it is an assessment of the resolution of categories by which a phenomenon is represented (i.e., categorical precision). Although, mathematically, a population density of 165.34 persons/sq. mi. would be considered precise, spatially it is not if that county is 1000 sq. mi. in size. Also, the map representation of the attribute (population density) looses its attribute categorical precision when the data are aggregated into an attribute category ranging from 50 500 persons/sq. mi. Figure 1 provides examples of topics for which map uncertainty is due primarily to accuracy or categorical precision.



Representational Issues

As has been pointed out elsewhere, the term visualization has a number of definitions (MacEachren and Ganter, 1990; MacEachren, et. al., 1992). Here it will be considered a human ability to develop mental images (often of relationships that have no visible form) together with the use of tools that can facilitate and augment this ability. Successful visualization tools allow our visual and cognitive processes to almost automatically focus on the patterns depicted rather than on mentally generating those patterns.

Following from the above conception of visualization, a research agenda to address visualizing uncertain information should include attention to the cognitive issues of what it means to understand attribute, spatial, and temporal uncertainty and the implications of this understanding for decision making and for symbolizing and categorizing uncertainty. At the most basic level, uncertainty can be divided into two components that might require different visualization strategies: visualizing accuracy and visualizing precision. In addition, attention should be directed toward the methodological, technical, and ergonomic issues of generating displays and creating interfaces that work. It is, of course, also essential to develop methods for assessing and measuring uncertainty before we can represent it. This latter topic, however, will not be addressed here.

Varied goals and needs - categories of interaction with data

If we continue to attack cartographic questions with our communication model visors on, we will fail to take advantage of the power that GIS and visualization tools provide. The search for the "optimal" data quality visualization tool might prove as fruitless as the search for the optimal graduated circle map. It is critical to recognize that GIS and visualization tools attached to them are used for a range of problem types that may have quite different visualization needs in general and visualization quality needs specifically. David DiBiase (1990) recently developed a graphic model of the range of uses to which graphics might be put in scientific research (fig. 2). I believe, that his basic model is relevant, not only to science, but to applied spatial decision making with a GIS.



As we begin to consider the visualization of uncertainty, we need to be cognizant of this range of visualization goals and environments and the varying information requirements it implies. The kind of uncertainty and the tools used to visualize it are likely to vary across this range from the use of GIS by an EPA scientist exploring the spatial distribution of a pollutant to the use of GIS driven map displays by policy makers trying to decide which industries to add to the list of those regulated for toxic waste emission.

Graphics variables

Because many or most GIS users are not trained in cartographic symbolization and design, it will be necessary to create expert systems that logically translate information into graphic displays. Jacques Bertin, the French cartographer/graphic theorist, has had a tremendous impact on our approach to this problem. The Robinson, et. al., text (1984), that is used by 50% of introductory cartography courses in the country (Fryman, 1990), cites Bertin's basic system of graphic variables (location, size, value, texture, color, orientation, and shape) as the fundamental units we can use to build a map image. Monmonier and Johnson's (1990) recent guide to map design for environmental GIS also presents Bertin's system as an important organizing concept for well designed maps. Weibel and Buttenfield (1988) in a paper on map design for GIS and Muller and Zeshen (1990) in a paper on expert systems for map design, both accept this system as a base to build from in designing expert systems for map symbolization.

An important representation issue for visualization of uncertainty, therefore, is how Bertin's graphic variables (with possible additions or modifications) might be logically matched with different kinds of data uncertainty. A critical distinction, of course, is that between ordered and differential graphic variables which can be logically associated with ordered/numerical and nominal/categorical differences among phenomena. Of Bertin's original graphic variables, size and value are most approariate for depicting uncertainty in numerical information, while color (hue), shape, and perhaps orientation can be used for uncertainty in nominal information. Texture, although it has an order, might work best in a binary classification of "certain enough" and "not certain enough" that could be used for either nominal or numerical data.

Although Bertin ignored it, the graphic variable that is arguably the most logical one to use for depicting uncertainty is color saturation. Saturation, added to the list of variables by Morrison (1974), is sometimes refered to as color purity. Saturation could be varied from pure hues for very certain information to unsaturated (grey) hues for uncertain information. Another variable, beyond Bertin's original seven, that seems quite promising as an uncertainty visualization tool is "focus." Presenting data "out of focus" (as you would see it with an out of focus camera), or simply at lower spatial resolution, might be an ideal way to depict uncertainty.

Symbol focus can be manipulated in at least four ways:

a) Contour crispness - The most obvious way to apply focus, is to vary the "crispness" or "fuzzyness" of symbol contours (edges). A certain boundary (e.g., the U.S. - Canada border) might be depicted with a sharp, narrow line, while an uncertain boundary (e.g., that between Kuwait and Iraq) might be portrayed with a broad fuzzy line that fades from the center toward the background (fig. 3). Similar "out-of-focus" symbols could be used to represent certain or uncertain location of point features, and an area may be depicted as not bounded at all, but as fading in a continuous fashion from core to periphery (fig. 4).





b) Fill clarity - For symbols having sufficient size to contain a fill that differs from the symbol's contour, characteristics of that fill can be manipulated to indicate certainty. A sharp, distinct pattern, for example, might be used to indicate certainty while a less defined pattern might indicate uncertainty (fig. 5).



c) Fog - The transparency of the "atmosphere" that an analyst views a map through can be controled on some computer display devices. It is possible to create what, in effect, looks like a fog passing between the analyst and the map -- the thicker the fog, the more uncertain that part of the map (fig. 6).



d) Resolution - Often maps are produced in which attribute data, geographic position, and temporal position are depicted with very different resolutions. One method of communicating uncertainty would be to adjust the resolution of geographic detail so that it corresponds to that of attributes or time (e.g., adjust resolution with which coastlines are depicted on a world map to correspond to the resolution of thematic information depicted) (fig. 7).



Linking visualization tools to models of uncertainty.

Different uncertainty visualization issues will arise when dealing with different kinds of data (e.g., qualitative data on land use/land cover versus quantitative data from the census). When data are quantities aggregated to units such as counties, we should consider the spatial characteristics of the phenomena represented by these quantities as we select symbolization methods to depict the uncertainty about them. One continuum of spatial characteristics that can be identified is that from discrete (spatially fragmented) to continuous (spatially comprehensive) phenomena. Both stepped and smooth continuous functions are possible (Hsu, 1979).

A second continuum relates to character of variation in the phenomenon across space. Some phenomena (e.g., tax rates) can vary quite abruptly as political boundaries are crossed while others (e.g., gallons of ground water pumped for irrigation per county) can exhibit a relatively smooth variation quite independent of the units to which data are aggregated. MacEachren and DiBiase (1991) recently proposed a series of graphic data models that represent locations in this continuity abruptness phenomena space (fig. 8). These graphic data models correspond to a range of two dimensional symbolization methods, which include standard forms such as dot, choropleth, isopleth, and graduated symbol, along with wome hybrid techniques designed to deal with the midpoints on the phenomena space axes (fig. 9).





Three research questions suggest themselves here: a) is it safe to assume that the spatial characteristics of uncertainty will mimic those of the phenomena that uncertainty is being estimated for, b) do specific symbolization methods actually communicate the particular spatial characteristics that we as cartographers associate them with (e.g., is a layer tinted isarithmic map depicting uncertainty in air pollution estimates interpreted as a smooth continuous surface or as discrete uncertainty regions) and c) what approach should be followed when a data set has multiple kinds of uncertainty associated with it and the spatial characteristics of that uncertainty vary.

User interfaces - How to merge data and uncertainty representations

Beyond the basic issue of how to represent uncertainty is the question of how and when to present the representation. This is complicated by the likelihood that GIS representations are often products of a combination of measured and model derived multivariate data. There seem to be three choices that could be used separately or in combination:

a) map pairs in which a data map is depicted side-by-side with a map of uncertainty about that data (fig. 10);



b) sequential presentation in which a user might be warned about uncertainty with an initial map which is followed by a map of the data (fig. 11), (or interactive tools that allow toggling between the data and the uncertainty representations);



c) bi-variate maps in which both the data of interest and the uncertainty estimate are incorporated in the same map (fig. 12).



Most attempts thus far to graphically depict uncertainty of spatial data have used the map pair strategy (e.g., Borrough, 1986; MacEachren, 1985; MacEachren and Davidson, 1987). Cartographers have spent relatively little time investigating the impact of sequential information presentation. Possibilities of interactive mapping and GIS, as well as animation, have, however, has begun to bring attention to this issue (Taylor, 1982; Slocum, 1988). One clear avenue to explore here is the potential of hypertext to allow user's to navigate through the maze of data and uncertainty representations that we might be able to provide (e.g., use of graphic scripts to guide this process (Monmonier, 1992)).

In relation to the third possibility, bivariate maps, the U.S. Census Bureau's bi-variate choropleth maps from the 1970 census are perhaps the best known attempt to relate two variables on one map. Experimentation with those maps by several researchers indicates that untrained readers have considerable trouble reading bi-variate maps (e.g., Olson, 1981). There are, however, a number of bi variate mapping possibilities that have not yet been investigated and previous attempts to use bivariate maps dealt with two different variables rather than with a single variable related to its uncertainty. Color saturation (or intensity), for example, might be used as a graphic variable for depicting uncertainty on maps in which different hues are used to represent the data values of interest (e.g., on a land cover map). For printed maps in black and white, a combination of texture and value may be effective (see fig. 12). The variable of focus might be used in similar ways.

In a dynamic visualization environment, it would be possible to combine sequencing and bi-variate techniques and allow a fade from a data map, through an uncertainty map, back to the data map. For qualitative areal data (e.g., soils) Fisher (1991) has suggested an animated technique to communicate the certainty (or uncertainty) of soil classifications for particular locations. In his visualization system, duration with which pixels are displayed in a particular color is matched to the probability of that pixel being in a particular soil classification. Certain sections of the map remain static and uncertain sections exhibit continual blinking between (or among) the potential soils for that place.

Evaluation of the utility or affect of providing uncertainty information

It is relatively easy to think up techniques by which uncertainty might be represented. Before we try to put these techniques into practice (particularly in a public policy context) we should evaluate their potential impact. The representation of uncertainty about information in a GIS provides a unique opportunity to determine whether our efforts at map symbolization and design research over the past 40 years have provided the tools required to develop a representation system. If past perceptual and cognitive research along with the conceptual models of symbol referent relationships based on semeiotics are really useful, we should be able to use them to formulate hypotheses and design appropriate experiments in our quest for answers about visualizing uncertainty.

This possibility may tempt some of us to go back to our roots in the communication model approach to cartography. Communication of data quality or uncertainty seems to be the ideal case for which the communication model was developed. Uncertainty can be treated as a precisely defined piece of information that we want a GIS user to obtain. I am afraid, however, that if we follow this narrow information theory approach we will hit the same dead ends that we did a decade or so ago.

This time around we need to be aware of the range of human-user interactions with graphics that occur from initial data exploration to presentation. For exploratory applications, where there is no predetermined message to communicate, we can not judge uncertainty depictions using communication effectiveness standards. We can only evaluate these depictions in terms of how they alter decision making, pattern recognition, hypothesis generation, or policy decisions. We also must be aware of the fact that our (possibly) precise uncertainty information is conditioned by the social-cultural context in which decisions about what to represent are made (e.g., a variety of estimates exist about the reliability of the U.S. Census Bureau's enumeration of homeless persons), and of the limited ability of cartographers to determine the relative importance of various kinds of quality or uncertainty information in a particular context.

In addition to the question of visualizing uncertainty, there is also a question of quality of visualizations to consider. One way to evaluate visualization of uncertainty tools, therefore, is to calibrate those tools in terms of their tendency toward type I and type II visualization errors (MacEachren and Ganter, 1990). Does providing uncertainty information (or providing it in a particular way) lead to a failure to notice patterns and relationships (type II) or to a tendency to see patterns that do not exist (type I)? Maps are re-presentations and as such are always one choice among many about how to re-present. There is always uncertainty in the choice of representation method, therefore, representing the uncertainty in our representations is an uncertain endeavor at best.

Notes

1. The ideas presented here were stimulated by an invitation to participate in the National Center for Geographic Information and Analysis Specialist's Meeting on Visualization of Data Quality, Initiative 7. The paper began as a "working paper" (Visualization of Data Uncertainty: Representational Issues) that was circulated only to the 25 participants of the meeting . The paper presented here is a revision and expantion of that working paper that benefited from reaction of other participants to the initial ideas as well as from discussion on related issues raised during the meeting. I gratefully acknowledge the invitation and travel support provided by the NCGIA through their National Science Foundation Grant # SES-88 10917.

2. This variable appears to have been orignially suggested by David Woodward in a seminar at Wisconsin (D. DiBiase and J. Krygier, personal communication).

3. This idea was offered by Michael Goodchild during the NCGIA Visualization of Data Quality Specialist Meeting.

References

Buttenfield, Barbara P. and Beard, M. Kate 1991. Visualizing the quality of spatial information. Technical Papers 1991 ACSM-ASPRS Annual Convention, Volume 6, Auto-Carto 10. 423-427.

DiBiase, David W. 1990. Scientific visualization in the earth sciences, Earth and Mineral Sciences, (Bulletin of the College of Earth and Mineral Sciences, The Pennsylvania State University, 59(2): 13-18.

DiBiase, David W. and Krygier, John 1992. (personal communication).

Fisher, Peter 1991. Modeling and visualizing uncertainty in geographic data, Position paper for the Visualization of Data Quality Specialist's Meeting , NCGIA Initiative 7, Castine, ME, June 8-12.

Fryman, James F. and Sines, Bonnie R. 1991. Anatomy of the introductory cartography course, Cartographic Perspectives, 8: 4-10.

Jenks, George F. 1967. The data model concept in statistical mapping. International Yearbook of Cartography, 7, 186-190.

Langford, M. and Unwin, D. J. 1991. Generating and mapping population density surfaces within a geographical information system, Cartographic Journal, (in press).

MacEachren, Alan M. 1985. Accuracy of Thematic Maps: Implications of Choropleth Symbolization, Cartographica, 22(1): 38-58.

MacEachren, Alan (in collaboration with Barbara Buttenfield, James Campbell, David DiBiase, and Mark Monmonier) Visualization, in Abler, Marcus, and Olson (eds.) Geography's Inner Worlds (in press).

MacEachren, Alan M. and Davidson, John H. 1987. Continuous Geographic Surfaces: Sample Resolution, Intermediate Value Estimation Accuracy, and Isometric Mapping, American Cartographer, 14(4): 299-320.

MacEachren, Alan M. and DiBiase, David W. 1991. Animated maps of aggregated data: Conceptual and practical problems, Cartography and Geographic Information Systems (in press).

MacEachren, Alan M. and Ganter, John H. 1990. A pattern identification approach to cartographic visualization. Cartographica, 27(2): 64-81.

Merriam-Webster 1974. Webster's New Collegiate Dictionary, Springfield, MA: G. &;C. Merriam Company, p. 905.

Moellering, H., Fritz, L., Nyerges, T., Liles, B., Chrisman, N., Poeppelmeier, C., Schmidt, W. and Rugg, R. (executive committee) 1988. The Proposed National Standard for Digital Cartographic Data, The American Cartographer, 15(1): 9-142.

Monmonier, Mark 1992. Authoring Graphic Scripts: Experiences and Principles, Cartography and Geographic Information Systems (in press).

Monmonier, Mark and Johnson, Branden B. 1990. Design Guide for Environmental Maps, Trenton, NJ: New Jersey Department of Environmental Protection, Division of Science and Research.

Morgan, M. Granger and Henrion, Max 1990. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge: Cambridge University Press.

Muller, Jean-Claude and Zeshen, 1990. A knowledge based system for cartographic symbol design, The Cartographic Journal, 27(2): 24-30.

National Center for Geographical Information and Analysis, The research plan of the National Center for Geographic Information and Analysis, International Journal of Geographical Information Systems, ????.

Olson, Judy M. 1981. Spectrally encoded two-variable maps. Annals of the Association of American Geographers, 71(2): 259-276.

Slocum, Terry A. 1988. Developing an information system for choropleth maps. Proceedings of the Third International Symposium on Spatial Data Handling, August 17-19, 1988, Sidney, Australia, 293-305.

Rejeski, David and Kapuscinski, Jacques 1990. Risk modeling with geographic information systems: Approaches and Issues, Report of the U.S. Environmental Protection Agency, Office of Information Resources Management.

Robinson, A. H., Sale, R. D., Morrison, J. L., and Muehrcke, P. C. 1984. Elements of Cartography, fifth edition, New York: John Wiley &;Sons.

Taylor, D. R. F. 1982. The cartographic potential of Telidon, Cartographica, 19(3&4):;18-30.

Weibel, Robert and Buttenfield, Barbara P. 1988. Map Design for Geographic Information Systems, GIS/LIS'88 Proceedings: Assessing the World, Volume I, San Antonio: ACSM, ASPRS, AAG, URISA, 350-359.

Acknowledgments

I would like to thank David DiBiase and Tony Williams for suggestions on an earlier draft of this paper, Alan Brenner for sharing ideas on bi-variate maps of uncertainty, and Catherine Reeves in the Deasy GeoGraphics Laboratory for production of most of the illustrations.

Alan M. MacEachren is a Professor in the Department of Geography, The Pennsylvania State University. e-mail contacts are encouraged [alan@essc.psu.edu].



Alan M. MacEachren
302 Walker
Dept. of Geography
The Pennsylvania State University
University Park, PA 16802
E-Mail: alan@essc.psu.edu
fax at: (814) 863-7943


Go back to beginning