There is a strong tradition in cartography of attention to data quality. Only rudimentary steps, however, have been made thus far to deal with the complex issues of visualizing data quality for multidimensional data displays used in image analysis and GIS applications. The importance of this topic is evinced by the decision of the National Center for Geographic Information and Analysis (NCGIA) to make visualization of data quality the first visualization initiative undertaken by the center (NCGIA, 1989).1
Kate Beard and Barbara Buttenfield (1991), presenting the NCGIA position, indicate that quality of spatial information "relates to accuracy, error, consistency, and reliability." These aspects of quality are meant to apply to more than locational verity. It is useful to begin consideration of quality issues with the framework of the Proposed Digital Cartographic Data Standard (Moellering, et. al., 1988), incorporating locational accuracy, attribute accuracy, logical consistency (i.e., a data structure whose topology makes sense), completeness (comprehensive data and systematic ways of dealing with missing values) and finally lineage.
The above quality categories are important, but to use a GIS effectively for either scientific inquiry or policy formulation, the scope must be broadened. In risk assessment circles, the term uncertainty has gained some acceptance and I suggest that we might be better off if we follow their lead (Morgan and Henrion, 1990; Rejeski and Kapuscinski, 1990). Analysts never know the precise amount of error in any particular data object -- or they would correct the error. They are more - or less - uncertain about the available characterization of particular data objects. From this perspective alone, the term uncertainty might be a better description of what the NCGIA (and many past cartographers) have been calling quality. In addition, however, uncertainty includes something of importance beyond the narrow definition of quality that the NCGIA initiative seems to be directed toward. A brief example will illustrate the difference between a focus on quality and on uncertainty and why it is the latter that should guide our efforts.
Imagine a single census block in a city. You have sent an enumerator out to take the census. In this particular case, the response rate is 90%. In data quality terms, we might say that our population and income information for this block is of less than perfect quality because of the lack of "completeness" in the data. Further, there may be "attribute inaccuracy" in the data collected due to misunderstanding of the survey questions or deliberate misinformation about items such as income or education, or "spatial inaccuracy" due to address coding errors by the census enumerator. If, in the adjacent census block we somehow achieved 100% participation in the census, everyone understood the questions and gave truthful responses, and the enumerator made no mistakes, a data quality assessment would label that unit's data as perfect. What we will be leaving out of this assessment is the issue of variability (over both space and time and within categories). This latter point is made quite forcefully by Langford and Unwin (1991) who argue that, for the mapping of most socio-economic phenomena, a choropleth map of aggregated data for enumeration units is "a poor choice" due to extreme within-unit variability that is the rule rather than the exception.
In addition to variability due to spatial aggregation, attribute aggregation adds additional variability, and therefore, uncertainty. All data are categorized. Even when individual measurements are retained in the database, categories will be implicitly defined by the mathematical precision of individual measurements. For example, temperatures might be measured to the nearest degree. Most data in a GIS, however, will be grouped into much broader categories (e.g., soil classifications, income brackets, whether a house has indoor plumbing or not, etc.). In all of these cases, the categorization introduces uncertainty even when the data are of high quality.
We can only be certain that a particular location -- a particular data object -- fits somewhere within the attribute bounds of the categories and the spatial bounds of the enumeration unit to which it is aggregated. The aggregate totals for our census blocks disguise the variability within those census blocks. Our level of uncertainty about map locations will be a function not only of the quality of values (as defined above), but of variance around the mean values we typically use to represent the unit, and of spatial variability across the unit.
In addition to spatial and attribute data quality and variability, a final uncertainty to be dealt with is temporal. The data, even if accurate and homogeneous, represent a snapshot at one point in time. Our uncertainty about their veracity will increase due to uncertainty about temporal information, resolution with which that information is specified, and the difference in time between data collection and data use. The temporally induced uncertainty will vary with kind of phenomena being represented.
When we use a GIS, the important issue is quality of the decisions we make -- about a research course to follow, an urban development policy to impose, or an environmental regulation to enforce. Whether we use the term data quality or data uncertainty matters less than whether the tool we give the GIS user is adequate for deciding how much faith to put in any particular piece of information extracted from the database. We can have highly accurate data while still having imprecise data. This lack of precision is at least as important an issue as a lack of accuracy. Precision here refers, not only to the specificity of data values in terms of significant digits, but in a more general sense to "the degree of refinement with which an operation is performed or a measurement taken" (Webster's New Collegiate Dictionary, 1974). In this sense it is an assessment of the resolution of categories by which a phenomenon is represented (i.e., categorical precision). Although, mathematically, a population density of 165.34 persons/sq. mi. would be considered precise, spatially it is not if that county is 1000 sq. mi. in size. Also, the map representation of the attribute (population density) looses its attribute categorical precision when the data are aggregated into an attribute category ranging from 50 500 persons/sq. mi. Figure 1 provides examples of topics for which map uncertainty is due primarily to accuracy or categorical precision.

Following from the above conception of visualization, a research agenda to address visualizing uncertain information should include attention to the cognitive issues of what it means to understand attribute, spatial, and temporal uncertainty and the implications of this understanding for decision making and for symbolizing and categorizing uncertainty. At the most basic level, uncertainty can be divided into two components that might require different visualization strategies: visualizing accuracy and visualizing precision. In addition, attention should be directed toward the methodological, technical, and ergonomic issues of generating displays and creating interfaces that work. It is, of course, also essential to develop methods for assessing and measuring uncertainty before we can represent it. This latter topic, however, will not be addressed here.

An important representation issue for visualization of uncertainty, therefore, is how Bertin's graphic variables (with possible additions or modifications) might be logically matched with different kinds of data uncertainty. A critical distinction, of course, is that between ordered and differential graphic variables which can be logically associated with ordered/numerical and nominal/categorical differences among phenomena. Of Bertin's original graphic variables, size and value are most approariate for depicting uncertainty in numerical information, while color (hue), shape, and perhaps orientation can be used for uncertainty in nominal information. Texture, although it has an order, might work best in a binary classification of "certain enough" and "not certain enough" that could be used for either nominal or numerical data.
Although Bertin ignored it, the graphic variable that is arguably the most logical one to use for depicting uncertainty is color saturation. Saturation, added to the list of variables by Morrison (1974), is sometimes refered to as color purity. Saturation could be varied from pure hues for very certain information to unsaturated (grey) hues for uncertain information. Another variable, beyond Bertin's original seven, that seems quite promising as an uncertainty visualization tool is "focus." Presenting data "out of focus" (as you would see it with an out of focus camera), or simply at lower spatial resolution, might be an ideal way to depict uncertainty.
Symbol focus can be manipulated in at least four ways:
a) Contour crispness - The most obvious way to apply focus, is to vary the "crispness" or "fuzzyness" of symbol contours (edges). A certain boundary (e.g., the U.S. - Canada border) might be depicted with a sharp, narrow line, while an uncertain boundary (e.g., that between Kuwait and Iraq) might be portrayed with a broad fuzzy line that fades from the center toward the background (fig. 3). Similar "out-of-focus" symbols could be used to represent certain or uncertain location of point features, and an area may be depicted as not bounded at all, but as fading in a continuous fashion from core to periphery (fig. 4).





A second continuum relates to character of variation in the phenomenon across space. Some phenomena (e.g., tax rates) can vary quite abruptly as political boundaries are crossed while others (e.g., gallons of ground water pumped for irrigation per county) can exhibit a relatively smooth variation quite independent of the units to which data are aggregated. MacEachren and DiBiase (1991) recently proposed a series of graphic data models that represent locations in this continuity abruptness phenomena space (fig. 8). These graphic data models correspond to a range of two dimensional symbolization methods, which include standard forms such as dot, choropleth, isopleth, and graduated symbol, along with wome hybrid techniques designed to deal with the midpoints on the phenomena space axes (fig. 9).

a) map pairs in which a data map is depicted side-by-side with a map of uncertainty about that data (fig. 10);



In relation to the third possibility, bivariate maps, the U.S. Census Bureau's bi-variate choropleth maps from the 1970 census are perhaps the best known attempt to relate two variables on one map. Experimentation with those maps by several researchers indicates that untrained readers have considerable trouble reading bi-variate maps (e.g., Olson, 1981). There are, however, a number of bi variate mapping possibilities that have not yet been investigated and previous attempts to use bivariate maps dealt with two different variables rather than with a single variable related to its uncertainty. Color saturation (or intensity), for example, might be used as a graphic variable for depicting uncertainty on maps in which different hues are used to represent the data values of interest (e.g., on a land cover map). For printed maps in black and white, a combination of texture and value may be effective (see fig. 12). The variable of focus might be used in similar ways.
In a dynamic visualization environment, it would be possible to combine sequencing and bi-variate techniques and allow a fade from a data map, through an uncertainty map, back to the data map. For qualitative areal data (e.g., soils) Fisher (1991) has suggested an animated technique to communicate the certainty (or uncertainty) of soil classifications for particular locations. In his visualization system, duration with which pixels are displayed in a particular color is matched to the probability of that pixel being in a particular soil classification. Certain sections of the map remain static and uncertain sections exhibit continual blinking between (or among) the potential soils for that place.
This possibility may tempt some of us to go back to our roots in the communication model approach to cartography. Communication of data quality or uncertainty seems to be the ideal case for which the communication model was developed. Uncertainty can be treated as a precisely defined piece of information that we want a GIS user to obtain. I am afraid, however, that if we follow this narrow information theory approach we will hit the same dead ends that we did a decade or so ago.
This time around we need to be aware of the range of human-user interactions with graphics that occur from initial data exploration to presentation. For exploratory applications, where there is no predetermined message to communicate, we can not judge uncertainty depictions using communication effectiveness standards. We can only evaluate these depictions in terms of how they alter decision making, pattern recognition, hypothesis generation, or policy decisions. We also must be aware of the fact that our (possibly) precise uncertainty information is conditioned by the social-cultural context in which decisions about what to represent are made (e.g., a variety of estimates exist about the reliability of the U.S. Census Bureau's enumeration of homeless persons), and of the limited ability of cartographers to determine the relative importance of various kinds of quality or uncertainty information in a particular context.
In addition to the question of visualizing uncertainty, there is also a question of quality of visualizations to consider. One way to evaluate visualization of uncertainty tools, therefore, is to calibrate those tools in terms of their tendency toward type I and type II visualization errors (MacEachren and Ganter, 1990). Does providing uncertainty information (or providing it in a particular way) lead to a failure to notice patterns and relationships (type II) or to a tendency to see patterns that do not exist (type I)? Maps are re-presentations and as such are always one choice among many about how to re-present. There is always uncertainty in the choice of representation method, therefore, representing the uncertainty in our representations is an uncertain endeavor at best.
2. This variable appears to have been orignially suggested by David Woodward in a seminar at Wisconsin (D. DiBiase and J. Krygier, personal communication).
3. This idea was offered by Michael Goodchild during the NCGIA Visualization of Data Quality Specialist Meeting.
DiBiase, David W. 1990. Scientific visualization in the earth sciences, Earth and Mineral Sciences, (Bulletin of the College of Earth and Mineral Sciences, The Pennsylvania State University, 59(2): 13-18.
DiBiase, David W. and Krygier, John 1992. (personal communication).
Fisher, Peter 1991. Modeling and visualizing uncertainty in geographic data, Position paper for the Visualization of Data Quality Specialist's Meeting , NCGIA Initiative 7, Castine, ME, June 8-12.
Fryman, James F. and Sines, Bonnie R. 1991. Anatomy of the introductory cartography course, Cartographic Perspectives, 8: 4-10.
Jenks, George F. 1967. The data model concept in statistical mapping. International Yearbook of Cartography, 7, 186-190.
Langford, M. and Unwin, D. J. 1991. Generating and mapping population density surfaces within a geographical information system, Cartographic Journal, (in press).
MacEachren, Alan M. 1985. Accuracy of Thematic Maps: Implications of Choropleth Symbolization, Cartographica, 22(1): 38-58.
MacEachren, Alan (in collaboration with Barbara Buttenfield, James Campbell, David DiBiase, and Mark Monmonier) Visualization, in Abler, Marcus, and Olson (eds.) Geography's Inner Worlds (in press).
MacEachren, Alan M. and Davidson, John H. 1987. Continuous Geographic Surfaces: Sample Resolution, Intermediate Value Estimation Accuracy, and Isometric Mapping, American Cartographer, 14(4): 299-320.
MacEachren, Alan M. and DiBiase, David W. 1991. Animated maps of aggregated data: Conceptual and practical problems, Cartography and Geographic Information Systems (in press).
MacEachren, Alan M. and Ganter, John H. 1990. A pattern identification approach to cartographic visualization. Cartographica, 27(2): 64-81.
Merriam-Webster 1974. Webster's New Collegiate Dictionary, Springfield, MA: G. &;C. Merriam Company, p. 905.
Moellering, H., Fritz, L., Nyerges, T., Liles, B., Chrisman, N., Poeppelmeier, C., Schmidt, W. and Rugg, R. (executive committee) 1988. The Proposed National Standard for Digital Cartographic Data, The American Cartographer, 15(1): 9-142.
Monmonier, Mark 1992. Authoring Graphic Scripts: Experiences and Principles, Cartography and Geographic Information Systems (in press).
Monmonier, Mark and Johnson, Branden B. 1990. Design Guide for Environmental Maps, Trenton, NJ: New Jersey Department of Environmental Protection, Division of Science and Research.
Morgan, M. Granger and Henrion, Max 1990. Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge: Cambridge University Press.
Muller, Jean-Claude and Zeshen, 1990. A knowledge based system for cartographic symbol design, The Cartographic Journal, 27(2): 24-30.
National Center for Geographical Information and Analysis, The research plan of the National Center for Geographic Information and Analysis, International Journal of Geographical Information Systems, ????.
Olson, Judy M. 1981. Spectrally encoded two-variable maps. Annals of the Association of American Geographers, 71(2): 259-276.
Slocum, Terry A. 1988. Developing an information system for choropleth maps. Proceedings of the Third International Symposium on Spatial Data Handling, August 17-19, 1988, Sidney, Australia, 293-305.
Rejeski, David and Kapuscinski, Jacques 1990. Risk modeling with geographic information systems: Approaches and Issues, Report of the U.S. Environmental Protection Agency, Office of Information Resources Management.
Robinson, A. H., Sale, R. D., Morrison, J. L., and Muehrcke, P. C. 1984. Elements of Cartography, fifth edition, New York: John Wiley &;Sons.
Taylor, D. R. F. 1982. The cartographic potential of Telidon, Cartographica, 19(3&4):;18-30.
Weibel, Robert and Buttenfield, Barbara P. 1988. Map Design for Geographic Information Systems, GIS/LIS'88 Proceedings: Assessing the World, Volume I, San Antonio: ACSM, ASPRS, AAG, URISA, 350-359.
Alan M. MacEachren is a Professor in the Department of Geography, The Pennsylvania
State University. e-mail contacts are encouraged [alan@essc.psu.edu].
Alan M. MacEachren
302 Walker
Dept. of Geography
The Pennsylvania State University
University Park, PA 16802
E-Mail: alan@essc.psu.edu
fax at: (814) 863-7943