Charles R. Ehlschlaeger
Dept. of Geography, Hunter College, NY
10021, USA
Abstract
Monte Carlo simulation, a technique that generates many versions of possible
application results, is a popular method for representing application uncertainty.
It requires a set of application input maps each of which consists of a
possible representation of reality. Representing the total distribution
of potential application results during the Monte Carlo process requires
two critical components: a probability model for each class in the qualitative
thematic map, and a model of spatial autocorrelation for these same classes.
The probability and spatial autocorrelation models described in this paper
assume a generalized thematic map of the study area is available. Samples
of qualitative thematic information accurate enough and with a resolution
fine enough to achieve useful application results are also required. A
generalized map, for the purpose of this paper, is a map containing qualitative
thematic information at a resolution too coarse and/or attribute information
too inaccurate to be useful for a particular application. A conflation
technique combining the information from the generalized map and samples
of application quality data defines a qualitative thematic probability
distribution while the samples of application quality data define the spatial
autocorrelation desired in the set of Monte Carlo input maps. This paper
introduces a combination of techniques including an inter-map cell swapping
algorithm, a class probability model, and a new spatial statistic. Together,
these techniques allow for the generation of spatially autocorrelated random
qualitative thematic maps for the purpose of representing spatial application
uncertainty.
Key words: uncertainty, thematic maps, conflation, conditional simulation,
unconditional simulation, geostatistics, and Monte Carlo simulation.
Probability modeling
-
Conceptual errors in a map are either:
-
unmapped inclusions (probably smaller than the minimum mapping unit of
a well-formed map) or
-
a misclassification of a class polygon (of a less than perfect map).
-
Qualitative thematic map errors are usually represented as a probability
vector for each cell in the study area.
-
Probability vectors can be used to represent the transition between two
classes if there are cartographic errors.
(1)
In this equation, M(U) is a realization of a qualitative
thematic map, p1(U) through pn(U)
is a set of class probability maps, and z1(U)
along with zj(U), j=2...n-1, are uniformly
distributed random fields for each class, except the final class, which
is represented by "left over" cells.
-
The greater the spatial dependence, the larger the inclusions are.
Modeling spatial dependence
-
The fields of geostatistics and landscape pattern metrics have both provided
useful spatial statistic measures for qualitative thematic maps.
-
Correlograms are one method to help define spatial dependence and spatial
distribution in geostatistics. While variograms are popular in conditional
simulation applications, correlograms provide advantages for modeling spatial
uncertainty, normally a stochastic simulation application.
-
Landscape pattern metrics, a set of measures defining land-cover map structure,
can be used to describe many, if not all, qualitative thematic map patterns.
Different landscape pattern metrics measure connectedness, diversity, shape
complexity, and size.
-
Choosing the appropriate landscape pattern metric or geostatistical measure
is a difficult choice. Monte Carlo simulation requires a set of application
input maps in which each input map consists of a possible representation
of reality.
-
There are two conceptual methods for constructing spatial dependence models:
random field models and cell swapping heuristics.
-
Random field models use a set of parameters to define the random field
function.
-
Intra-map cell swapping heuristic swaps two random cells in a realization
map if the potential realization map more closely matches the goal spatial
statistic.
-
A cell swapping heuristic's significant advantage over random field models
is that swapping cells can create random fields mimicking any spatial statistic.
-
A generalized map, for the purpose of this paper, is a map containing qualitative
thematic information at a resolution too coarse and/or attribute information
too inaccurate to be useful for a particular application.
-
Application quality data is qualitative thematic information accurate enough
and with a resolution fine enough to achieve useful application results.
-
The requirement that a generalized map and samples of higher quality data
be available is both a potential solution to a classic uncertainty issue
as well as a practical way to understand the impact of higher quality data.
-
Maps are, by definition, generalized. Imprecise or nonexistent map error
statements as well as the lack of consensus on how to precisely describe
map uncertainty makes maps suspect as to whether they are accurate enough
for a particular spatial application.
-
Decision makers can only be certain of application results with well known
applications using standardized maps.
-
A probability vector is a set of values representing the likelihood each
class is located in that particular cell.
-
The generalized map is re-sampled to the resolution of the application
quality data.
-
A densogram measures the proportion of each application quality cell’s
class at various distances from cells of the same class.
The following figure contains the densogram for the application quality
(40m resolution) geology map used in this case study.
-
The inter-map cell swapping heuristic begins with a set of random thematic
maps.
-
Each cell within each random thematic map is given a class value based
on the probability vector defined by the location of class polygons in
the surrounding generalized map.
-
Cells are swapped at the same location from different maps if the resulting
maps’ densograms provide a better overall fit to the application quality
data’s densogram.
-
This heuristic is continued until no more possible cell swaps are possible.
Probability Model
-
The choice of probability model should be dependent on the types of errors
most prevalent in the study.
-
When the generalized map's minimum mapping unit (MMU) is many magnitudes
larger than the MMU of the application quality data, thematic errors will
be much more prevalent than positional errors.
-
Because cartographic errors are the bulk of errors in this case study,
the likelihood a cell contains a specific class is based on how far from
the more generalized polygon edge the cell is.
-
Conceptually, this probability model creates buffers around as well as
within class polygons and determines the density for that class for each
buffer. Class density within a distance buffer, or lag, determines the
potential the class exists at that distance.
-
Each realization cell determines its probability vector by normalizing
all class potentials based on the cell's distance to the generalized map
class polygon.
(2)
(3)
In the equations above, Aj is a sampled point
of application quality data, Mi is a point on
the study area map, di,j is the distance between points
i
and j, L is the lag interval, c is a class represented
in the samples of application quality data, and l is the lag.
(4)
The cell probability equation, equation (4), determines its probability
vector by normalizing all class potentials based on the cell's distance
to the generalized map class polygon. In the equation above, Mi
and Mi are points on the study area map, di,j
is the distance between points i and j, L is the lag
interval, b is a class in the study area map, Cm
are all classes in the study area map, and c is a class represented
in the samples of application quality data, and l is the lag.
Densogram Spatial Statistic
The densogram equation is:
(5)
In the equation above, Ai and Aj
are sampled points of application quality data, di,j
is the distance between points i and j, L is the lag
interval, c is a class represented in the samples of application
quality data, and l is the lag.
Spatial Dependence Model
This section describes the spatial dependence model used for this case
study. The spatial dependence model uses a combination of a random field
model and an inter-map cell swapping heuristic. Combining the random field
model and a cell swapping heuristic cancels out the main disadvantages
of each methodology.
Random field models can create reasonably precise representations of
goal surfaces in a comparatively short period of time. However, random
field models require multiple parameters to be determined in order to provide
a good fit. Variogram models have sill, range, and nugget parameters, and
the random field model used in this case study (from Ehlschlaeger &
Goodchild, 1994a) has three parameters to make a decent fit. Cell swapping
heuristics, on the other hand, require no parameters. However, they consume
many computer CPU cycles to complete, and the resulting solutions can fit
poorly. The following figures illustrate this behavior.
Figure two shows two classes, a best case and a worst case, from a typical
realization (chosen at random). The classes labeled (goal) indicate the
lag densities desired. The dashed lines (cor) are the lag densities that
correlated random fields will create before any cell swapping. The dotted
lines (uncor) are lag densities created by uncorrelated random fields before
cell swapping occurred.
Figures 3-6 illustrate the lag densities of a typical realization before
(figures three & five) and after (figures four & six) cell swapping.
Figures three and four show swapping applied to uncorrelated random fields
while figures five and six use correlated random fields. In both swapping
sets, only lag densities to 160m were fitted using a least-squared estimator
fit. Notice the uncorrelated random fields could, at best, achieve 0.01
density error for rook-case adjacent cell after swapping (figure four).
Using correlated random fields, density errors were in the range of 0.0001-0.0005
for rooks-case adjacent cells after swapping (figure six).
All tests were done on a 200mhz Pentium computer. The probability distribution
required 15 minutes to complete. The random field estimator and generator
took 9.5 hours to construct correlated random fields with less error than
the results of the cell swapping heuristic with uncorrelated random fields
as its inputs. The cell swapping heuristic with uncorrelated random fields
required 47 hours of swapping time to complete. Thus, the uncorrelated
cell swapping heuristic was slower and less accurate than the random field
model (even without manual fitting). The correlated cell swapping heuristic
was the slowest of all tests. To find the optimal solution, 547.5 hours
of processing time was used even though only 41.5% of the cells were swapped,
compared to the number swapped by the uncorrelated cell swapping heuristic.