A Comparison of Supervised Imagery Classification Using Analyst-Chosen and Geostatistically-Chosen Training Sets
SHINE, James A. (email@example.com), WAKEFIELD, Gery I., U.S. Army Topographic Engineering Center, ATTN: CETEC-TR-G, 7701 Telegraph Road, Alexandria VA 22315-3864
Key Words: supervised classification, geostatistics, spatial variation, imagery analysis
A continuing challenge in image processing is the classification of spatial imagery into categories. Examples of these categories are: roads, urban areas, evergreen trees, deciduous trees, water, and grasslands. The accurate classification of images has a wide range of applications, including reconnaissance, assessment of environmental damage, land use monitoring, urban planning, and growth regulation.
One classification approach is supervised classification. The imagery is divided into training data and test data. The correct categories are known for the training data, and some classification approach is specified based on this data. This approach is then used to classify the test data. Some approaches include classification trees, minimum distance statistical approaches, and neural networks. The choice of a good training set can have significant influence on the success of a classification approach.
A common technique in imagery classification is selection of good test data points by an experienced analyst. An image or set of registered images is viewed in image processing software such as ERDAS Imagine, and some pixels that are unambiguous in each of the desired classification categories are selected and used for training data. The entire image is then classified based on the training data metrics. In cases where ground truth is available, classification accuracy can be assessed by use of an error matrix. This method can be time consuming and requires expertise on the part of the analyst, something not available in all classification settings.
A recent approach uses spatial variation scales from geostatistical analysis to choose the training points rather than an analyst's choices. A semivariogram is computed on the pixel values of an image, and a spatial variation scale is determined from this semivariogram. A grid of points chosen from this scale (usually 50 percent of the scale) is then selected for the training data and is then used for the classification. This approach does not require an experienced analyst.
Experiments comparing these two approaches have been conducted using several registered images of Fort A.P. Hill, Virginia, which have accompanying accurate ground truth. The experiments were performed in ERDAS Imagine 8.3 using a maximum distance supervised classifier. The results of the error matrices for the two approaches were not statistically different. Geostatistically chosen training data has the potential to reduce the need for experienced image analysts to perform imagery classification. Further developments may make further automation of the imagery classification process possible.