GeoComputation 99 Logo

 

Representing Uncertainty of Qualitative Thematic Maps with an Inter-Map Cell Swapping Heuristic

Charles R. Ehlschlaeger
Dept. of Geography, Hunter College, NY 10021, USA

Introduction

Abstract

Monte Carlo simulation, a technique that generates many versions of possible application results, is a popular method for representing application uncertainty. It requires a set of application input maps each of which consists of a possible representation of reality. Representing the total distribution of potential application results during the Monte Carlo process requires two critical components: a probability model for each class in the qualitative thematic map, and a model of spatial autocorrelation for these same classes. The probability and spatial autocorrelation models described in this paper assume a generalized thematic map of the study area is available. Samples of qualitative thematic information accurate enough and with a resolution fine enough to achieve useful application results are also required. A generalized map, for the purpose of this paper, is a map containing qualitative thematic information at a resolution too coarse and/or attribute information too inaccurate to be useful for a particular application. A conflation technique combining the information from the generalized map and samples of application quality data defines a qualitative thematic probability distribution while the samples of application quality data define the spatial autocorrelation desired in the set of Monte Carlo input maps. This paper introduces a combination of techniques including an inter-map cell swapping algorithm, a class probability model, and a new spatial statistic. Together, these techniques allow for the generation of spatially autocorrelated random qualitative thematic maps for the purpose of representing spatial application uncertainty.

Key words: uncertainty, thematic maps, conflation, conditional simulation, unconditional simulation, geostatistics, and Monte Carlo simulation.

Source Code and Data

Probability modeling


(1)
In this equation, M(U) is a realization of a qualitative thematic map, p1(U) through pn(U) is a set of class probability maps, and z1(U) along with zj(U), j=2...n-1, are uniformly distributed random fields for each class, except the final class, which is represented by "left over" cells.

Modeling spatial dependence

The following figure contains the densogram for the application quality (40m resolution) geology map used in this case study.
 
Figure One, Geology Densogram

Probability Model

(2)

(3)

In the equations above, Aj is a sampled point of application quality data, Mi is a point on the study area map, di,j is the distance between points i and j, L is the lag interval, c is a class represented in the samples of application quality data, and l is the lag.

(4)

The cell probability equation, equation (4), determines its probability vector by normalizing all class potentials based on the cell's distance to the generalized map class polygon. In the equation above, Mi and Mi are points on the study area map, di,j is the distance between points i and j, L is the lag interval, b is a class in the study area map, Cm are all classes in the study area map, and c is a class represented in the samples of application quality data, and l is the lag.

Densogram Spatial Statistic

The densogram equation is:

(5)

In the equation above, Ai and Aj are sampled points of application quality data, di,j is the distance between points i and j, L is the lag interval, c is a class represented in the samples of application quality data, and l is the lag.

Spatial Dependence Model

This section describes the spatial dependence model used for this case study. The spatial dependence model uses a combination of a random field model and an inter-map cell swapping heuristic. Combining the random field model and a cell swapping heuristic cancels out the main disadvantages of each methodology.

Random field models can create reasonably precise representations of goal surfaces in a comparatively short period of time. However, random field models require multiple parameters to be determined in order to provide a good fit. Variogram models have sill, range, and nugget parameters, and the random field model used in this case study (from Ehlschlaeger & Goodchild, 1994a) has three parameters to make a decent fit. Cell swapping heuristics, on the other hand, require no parameters. However, they consume many computer CPU cycles to complete, and the resulting solutions can fit poorly. The following figures illustrate this behavior.
 

Figure Two, Densograms of Several Classes
Figure two shows two classes, a best case and a worst case, from a typical realization (chosen at random). The classes labeled (goal) indicate the lag densities desired. The dashed lines (cor) are the lag densities that correlated random fields will create before any cell swapping. The dotted lines (uncor) are lag densities created by uncorrelated random fields before cell swapping occurred.
Figure Three, Errors of Uncorrelated Realization before Inter-map Swapping
Figure Five, Errors of Correlated Realization before Inter-map Swapping
Figure Four, Errors of Uncorrelated Realization after Inter-map Swapping
Figure Six, Errors of Correlated Realization after Inter-map Swapping
Figures 3-6 illustrate the lag densities of a typical realization before (figures three & five) and after (figures four & six) cell swapping. Figures three and four show swapping applied to uncorrelated random fields while figures five and six use correlated random fields. In both swapping sets, only lag densities to 160m were fitted using a least-squared estimator fit. Notice the uncorrelated random fields could, at best, achieve 0.01 density error for rook-case adjacent cell after swapping (figure four). Using correlated random fields, density errors were in the range of 0.0001-0.0005 for rooks-case adjacent cells after swapping (figure six).

All tests were done on a 200mhz Pentium computer. The probability distribution required 15 minutes to complete. The random field estimator and generator took 9.5 hours to construct correlated random fields with less error than the results of the cell swapping heuristic with uncorrelated random fields as its inputs. The cell swapping heuristic with uncorrelated random fields required 47 hours of swapping time to complete. Thus, the uncorrelated cell swapping heuristic was slower and less accurate than the random field model (even without manual fitting). The correlated cell swapping heuristic was the slowest of all tests. To find the optimal solution, 547.5 hours of processing time was used even though only 41.5% of the cells were swapped, compared to the number swapped by the uncorrelated cell swapping heuristic.
 
Figure Seven, Class Errors of Initial Noise
Figure Eight, Class Errors after Swapping fitted to 1600m.

 
Figure Nine, Class Errors after Swapping fitted to 160m.