
A continuing challenge in image processing is the classification of spatial imagery into categories. Categories of interest can vary depending on the application; some common categories include urban areas, roads, bodies of water, grasslands, scrub brush, and various tree species (deciduous hardwoods, evergreen pines, mixed). The generation of data products resulting from a category classification of imagery can be used for a variety of applications. Some of these applications include reconnaissance, assessment of environmental damage, land use monitoring , radiation monitoring (Badr 1993, Cressie 1991), urban planning, growth regulation, soil evaluation, and crop yield assessment (Oliver and Webster 1989).
Geocomputational software packages such as ERDAS Imagine include various classification algorithms as part of their basic menu of functions and tools, or as attachable modules. One such classification approach involves selecting a subset of the data set for which the correct classification categories are already known. The classification algorithm uses these values to "train" its parameters on this subset in an optimal manner, usually by minimizing some error metric. The trained algorithm is then used to classify unknown data values ( "test" values) into these same categories. If the correct categories are also known for these test values, the accuracy of the classification can be computed. Designed experiments on the same data can be used to compare classification accuracy for different approaches.
Selection of training points has traditionally been performed by an analyst who visually categorizes data points and chooses a sufficient sample of points from each category to train the algorithm. This has two drawbacks: it is generally timeconsuming, and it requires an analyst capable of performing such a visual classification, an expertise that not all image analysts have in abundance. It would thus be desirable to find any approaches which would minimize or eliminate the necessity of the human selection of training points, without sacrificing classification accuracy on the test data. A sampling strategy from geostatistical analysis was considered a promising effort in this direction, and the rest of this paper describes some initial efforts to evaluate the potential of this idea.
Geostatistics is a term commonly used to describe a set of techniques that model spatial variation in data and use these models to estimate or classify other data based on these models. Geostatistics developed out of empirical approaches developed by South African mining engineers in the 1950s and 1960s (Krige 1989) and were given theoretical validity by the development of random function theory in the 1960s (Matheron 1970).
The basic concept of geostatistics is that of scales of spatial variation. Data which is spatially independent show the same variability regardless of the location of data points. However, spatial data in most cases is not spatially independent. Data values which are close spatially show less variability than data values which are farther away from each other. The exact nature of this pattern varies from data set to data set; each set of data has its own unique function of variability and distance between data points. This variability is generally computed as a function called semivariance, which can be described by
Where z denotes a data value at a particular location, h is the distance between data values, and n(h) are the number of pairs of data values a distance of h apart.
A plot of the semivariance versus distance between data values is known as a semivariogram, or simply as a variogram; we will use the latter term in this paper. The variogram is the central analytical tool used in geostatistics. A sample variogram is given in Figure 1:
Figure 1 shows that at short distances, the variation is small, and the variation increases with distance until it stabilizes at a certain distance. This distance is a scale of spatial variation which can be used for several purposes. In standard geostatistics, various models are fitted to the variogram and the best model is chosen to estimate data values at unknown locations, a process known as kriging. There are various forms of kriging estimation, but since kriging is not part of the work reported in this paper, they will not be discussed. The main use of geostatistics in the work reported here is the acquisition of a primary scale of variation to sample points for supervised classification.
(It should also be noted that the variogram example shown here is a very simple case; some variograms show no relationship between semivariance and distance, and others show more than one stabilization points for multiple scales of variation. However, again this does not apply to the work reported here.)
The hypothesis which is being examined here is that since the variogram gives the scale or scales of the data’s spatial variation, these scales can be used to determine sampling strategies for various applications. In this case, the hypothesis is that if the variogram of the data indicates a specific spatial variation scale, sampling at less than this scale (typically half the distance) all variation will be detected. This would mean that training points could be chosen on a random grid based on this scale rather than by the more traditional, timeconsuming and expertiserequiring approach of handpicking data points for each category from the image.
The area chosen for this paper is Fort A.P. Hill, a U.S. military reservation in central Virginia. The U.S. Army Topographic Engineering Center has several sources of imagery for this area, including 20meter resolution multispectral SPOT imagery, and 1meter resolution Computerized Airborne Multicamera Imaging System (CAMIS) imagery. (Resolution is the distance represented by one pixel of the imagery.) There is also accurate ground truth information available for significant segments of the Fort A.P. Hill area, which permits both training for supervised classification, and accuracy assessment of the classification testing.
Fort A.P. Hill’s geographic location is shown in Figure 2A; a mosaic photo of Fort. A.P. Hill is shown in Figure 2B.
FIGURE 2A
FIGURE 2B
Previous geostatistical analysis of Fort A.P. Hill has revealed a scale of spatial variation at approximately 320 meters (Oliver & Webster, 1998). This distance was halved as is the current wisdom in geostatistical analysis to create a grid of 99 points 160 meters apart. These points were then used to train a maximum distance supervised classifier; the classifier was then tested on 256 points outside of the training region. For the analystchosen approach, four or five points representing each of the categories to be classified were chosen by analysts and these were used to train the same classifier, which then was also tested on the same 256 points.
The metrics used for comparison of approaches were the error matrix and the kappa coefficient. The error matrix (also called the confusion matrix) is a k x k matrix where k is the number of classification categories. The error matrix gives the counts of how each of the test points was classified. The rows represent the actual classified data by category, and the columns represent the reference data by category. Correct classifications will be recorded in the matrix diagonals, while incorrect classifications will go in offdiagonal positions. The error matrix allows measure ment of overall accuracy, category accuracy, producer’s accuracy (percentage correct in the columns) and user’s accuracy (percentage correct in the rows). Error matrices for all experiments are given in the next section (Congalton 1991).
For this work, the kappa coefficient is of more interest. The kappa coefficient is a measure of association between two categorical variables. It is widely used in remote sensing classification to assess the degree of success of a classification approach. In more general categorical data analysis, the kappa coefficient is used to measure the agreement between two observers on the same data; for remote sensing, it is used to measure the agreement between the classification approach and the actual answers.
A value of 0 indicates no agreement between the two observers except that expected by chance; a value of 1 indicates perfect agreement, with all the values falling on the diagonals (Agresti 1990).
If n(i,j) represents the error matrix count in the ith row and jth column, n(i,+) represents the sum of the ith row, n(+,j) represents the sum of the jth column, and n represents the total count in all cells of the error matrix, the estimate for kappa is
where
and
Since we are interested here more in comparison between two different kappas, we also need an estimated variance for kappa; this is a long formula which may be found in (Congalton 1999), p. 50. The derivation of the variance formula can be found in (Fleiss 1969). If k1 is the estimated kappa for one approach, k1var is its estimated variance, k2 is the estimated value for the second approach, and k2var is its estimated variance, then
Will be a standardized normal variable, and we can test the hypothesis that the two kappas are equal versus the alternative that they are not by comparing Z against normal distribution functions and rejecting if Z is greater than a certain amount (1.96 for a 95% test).
Initial kappa coefficients were computed in ERDAS Imagine; the error matrices were reentered into SAS statistical software to obtain the variance estimates necessary to compare the different tests.
Two independent experiments were conducted by the authors. Seven categories were tested: urban, water, grass, evergreen, hardwood, scrub and road. Because none of the results achieved a classification in scrub and road, it was necessary to collapse the categories; scrub and grass were combined into field, and road was combined into urban.
The error matrices for the two experiments are shown in Tables 1 and 2.
TABLE 1:
TEST 1, GEOSTATISTICALLYCHOSEN 


ERROR MATRIX 


URBAN 
WATER 
FIELD 
EVERGRN 
HARDWD 
ROWSUM 
URBAN 
2 
2 
3 
5 
0 
12 
WATER 
0 
0 
0 
1 
0 
1 
FIELD 
0 
3 
17 
9 
10 
39 
EVERGR 
1 
4 
4 
85 
27 
121 
HRDWD 
0 
0 
0 
18 
65 
83 
COLSUM 
3 
9 
24 
118 
102 
256 
TEST 1, ANALYTICALLYCHOSEN 


ERROR MATRIX 


URBAN 
WATER 
FIELD 
EVERGRN 
HARDWD 
ROWSUM 
URBAN 
2 
0 
10 
16 
4 
32 
WATER 
0 
3 
0 
5 
0 
8 
FIELD 
1 
2 
13 
5 
9 
30 
EVERGR 
0 
2 
1 
66 
14 
83 
HRDWD 
0 
2 
0 
26 
75 
103 
COLSUM 
3 
9 
24 
118 
102 
256 
TABLE 2:
TEST 2, ANALYTICALLYCHOSEN 


ERROR MATRIX 


URBAN 
WATER 
FIELD 
EVERGRN 
HARDWD 
ROWSUM 
URBAN 
3 
1 
12 
25 
5 
46 
WATER 
0 
3 
0 
3 
0 
6 
FIELD 
0 
1 
10 
1 
6 
18 
EVERGR 
0 
4 
2 
68 
24 
98 
HRDWD 
0 
0 
0 
21 
67 
88 
COLSUM 
3 
9 
24 
118 
102 
256 
TEST 2, GEOSTATISTICALLYCHOSEN 


ERROR MATRIX 


URBAN 
WATER 
FIELD 
EVERGRN 
HARDWD 
ROWSUM 
URBAN 
3 
0 
6 
7 
0 
16 
WATER 
0 
3 
1 
18 
0 
22 
FIELD 
0 
1 
14 
5 
8 
28 
EVERGR 
0 
4 
3 
65 
21 
93 
HRDWD 
0 
1 
0 
23 
73 
97 
COLSUM 
3 
9 
24 
118 
102 
256 
Results of comparing kappas are shown in Table 3. It can be seen that the geostatisticallychosen method is not significantly different from the analystchosen method; indeed it had higher kappas in both cases, although not at a statistically significant level.
TABLE 3:
TEST 1 

ANALYST 
GEOSTAT 

KAPPA 
0.44 
0.467 
KAPPA VARIANCE 
0.001681 
0.001936 
KAPPA STD ERROR 
0.041 
0.044 
95% CONF INT 
(.359,.521) 
(.380,.554) 
Z VALUE 
0.4489 

SIGNIFICANTLY DIFFERENT? 
NO 

TEST 2 

ANALYST 
GEOSTAT 

KAPPA 
0.394 
0.427 
KAPPA VARIANCE 
0.001764 
,001748 
KAPPA STD ERROR 
,042 
0.043 
95% CONF INT 
(.312,.477) 
(.342,.511) 
Z VALUE 
0.549 

SIGNIFICANTLY DIFFERENT? 
NO 
The results in this paper show preliminary evidence that choosing test points in supervised classification on a regular grid using a geostatisticallychosen training data, based on a spatial variation scale determined from a data variogram, produces results which are comparable to those produced by analystchosen test points. Advantages of the geostatisticallychosen approach are that image interpretation experience is not necessary to choose test points, and the results reported in this paper indicate a small savings in total processing time. A disadvantage is that more sparse categories may not fall on the random points, and some collapsing of categories may be necessary as was the case in these experiments. However, this may actually turn out to be an improvement since depending on the sample size, extra categories may not be statistically justified.
Some directions for future work are: to run more tests on different areas of the A.P. Hill imagery; to run tests using different analysts, expert and otherwise; to run the comparison tests with other imagery; and to compare these two techniques on other supervised classification methods. The issue of the optimum number of categories, and which categories are closest to which other categories, also bears further consideration; a clear delineation between the categories would allow the use of a weighted kappa coefficient, where mistakes close to the "correct" answer are counted less than ones far away. A weighted kappa might reveal more information as well.
Agresti, "Categorical Data Analysis", John Wiley & Sons, 1990.
Badr , Oliver, Hendry and Durrani, "Spatial Variation in Soil Radon", Radiation Protection Dosimetry, Vol. 49, No. 4, 1993, pp.433442.
R.G. Congalton, "A Review of Assessing the Accuracy of Classification of Remotely Sensed Data", Remote Sensing of the Environment, volume 37, pp. 3546, 1991.
Congalton and Green, "Assessing the Accuracy of Remotely Sensed Data: Principles and Practices", Lewis Publishers, 1999.
Cressie, "Statistics for Spatial Data", John Wiley & Sons, 1991.
Fleiss, Cohen and Everitt, "Largesample standard errors of kappa and weighted kappa", Psychological Bulletin, Volume 72, pp. 323327, 1969.
Krige, Guarascio and CamisaniCalzolari, "Early South African Geostatistical Techniques in Today’s Perspective", in "Geostatistics", Kluwer, pp. 119, 1989.
Matheron, "The Theory of Regionalized Variables and Its Applications", Fontainebleau, 1970.
Oliver and Webster, "A Geostatistical Basis for Spatial Weighting in Multivariate Classification", Mathematical Geology, volume 21, No. 1, 1989.
Oliver and Webster, "Report of the Geostatistical Analysis of High Resolution Multispectral Imagery", Report for the U.S. Army European Research Office, July 1998.
Stokes, Davis and Koch, "Categorical Data Analysis Using the SAS System", SAS Institute Inc, 1995.