Geographic Information Processing and Digital Cartography, by their very nature, employ digital techniques and sophisticated algorithms for analyzing and displaying digital geographic data. One assumes that the software one uses contains methodologies and algorithms to solve the particular geographic or spatial problem at hand, accurately and precisely. We implicitly have confidence in the implementation of these algorithms, and generally assume that the results produced by these GIS are as accurate as the developers of the software intended them to be. We put faith in their judgement for the selection and implementation of various algorithms for solving, in this context, spatial geographic problems. However, sophistication of software and accuracy of results may not go hand in hand. A simple case in point is the calculation of the perimeter and area of a polygon, within a raster data structure. This calculation seems simple enough, but in extreme cases, most GIS systems produce errors of up to 40% for the perimeter calculation, and up to 10% for area calculations. This paper will examine the typical algorithms used for calculating the perimeter and area statistic, and will present an improved algorithm that produces much less error,
The structure of a spatial data set determines the type of algorithms employed to analyze the data. In a GIS, the two main data structures are raster and vector, and each has properties suitable for solving problems unique to their data structure. Each has their advantages and disadvantages for solving these geographic problems, and algorithms that are simple to implement in a raster environment may be difficult to implement in a vector environment, and vice versa. Traditional methods of calculating the perimeter and area of a polygon, in raster and vector format, will expose the weakness of the raster calculation.
In a vector data structure, the perimeter of a polygon is easily and accurately calculated using the Pythagoras formula. The distance between two vector coordinates, x1,y1 and x2,y2 is given by the formula
For a closed polygon containing N vertices (where xN=x1 and yN=y1), a simple summation of the lengths of all the edges of the polygon will result in the perimeter of the polygon. Stated as a formula,
The area of a vector polygon is also easily computed, using several techniques. Using the Trapezoidal method, each side of the polygon forms a trapezoid with the X axis (see Figure 1).
Figure 1. Vector polygon with each side subtending a trapezoid
The area of a trapezoid subtended by polygon side A, from coordinate x1,y1 to x2,y2 is
This is effectively the width times the average height, and gives the area of an equivalent rectangle. For a closed polygon containing N vertices (where xN=x1 and yN=y1), a simple summation of the areas of all the subtended trapezoids of the polygon will result in the total area of the polygon, assuming a clockwise directed polygon. Stated as a formula,
In raster space, the algorithms for computing perimeters and areas of polygons (or raster areas, contiguous areas of common value) are generally quite simplistic, and not as precise as the vector methods. For the area calculation, the numbers of pixels contained within the polygonal area are counted. The pixel size is a known quantity, so the area of a pixel is also known. Multiplying the number of pixels by the area of one pixel results in the area of the polygon. For perimeters of raster polygonal areas, a more complex algorithm is employed. All edges of the polygonal region (an edge of a raster polygon exists when a pixel of the raster polygon has an adjacent pixel of a different value) are determined and counted. Multiplying the number of edges by the length of the side of a pixel results in the perimeter of the polygonal area. The difficulty with this technique is the inherent problem of the raster data structure – aliasing in the representation of non-vertical or non-horizontal lines.
When a vector map is rasterized for use within a GIS, the horizontal and vertical sections of boundary lines are converted into straight rows or columns of pixels. However, the other ‘diagonal’ sections of lines are converted into ‘stairstep’ like lines, using raster pixels, which attempt to follow the straight vector lines. This results in aliasing, that jagged look to the lines due to the pixelation effect. The finer the resolution (i.e. the smaller the pixel), the less apparent is the aliasing. Nevertheless, this aliasing causes an overestimation in the number of edges used to represent the boundary of a region in raster space. When a diagonal section of the perimeter of the polygon is represented in raster space, typical algorithms count the two exposed edges of the pixel as contributing to the perimeter, or two times the length of the side of the pixel. A more accurate calculation would count the diagonal distance across the pixel, or root 2 times the length of the side of the pixel. Figure 2 demonstrates the aliasing problem in the raster representation of vector objects. This simplicity of current algorithms, which ignore the significance of the aliasing problem, contributes to the errors of the perimeter and area statistic. By experimentation, using the nine square raster file described later, it was proven that IDRISI, EPPL7, PCI, ArcView and SPANS GIS all implement the same simple algorithms, since the results of their perimeter and area calculations were identical among them. Because of this, any subsequent reference to IDRISI is interchangeable with any of the above mentioned GIS systems. This simple experiment can easily determine if these simple algorithms are implemented in any GIS system.
Figure 2. A rotated square and its rasterized equivalent
The algorithm presented here attempts to compensate for the aliasing problem by adjusting for diagonal sections of polygon boundaries. To determine the appropriate contribution of an edge pixel to the perimeter, a scheme was devised to determine edge pixels (or pixels with edges bounding on unlike pixels), and their corresponding perimeter contribution. A 3x3 pixel matrix, or moving window, was defined that would represent every combination of up to 8 like pixels surrounding the centre pixel. Each of the 8 surrounding pixels would have a unique position code, as shown in Figure 3. Summing these codes, when an adjacent pixel has the same value as the central pixel, would result in a number between 0 and 255, or one of 256 unique pattern values. Figure 4 shows the 256 different surround patterns in sequential order. Each unique pattern value has a perimeter value associated with its unique pattern, and, by examining each pattern, an appropriate perimeter value was selected for each pattern. A pattern could have one of six perimeter values (1, 2, 3, 4, 1.414 or 2.828), depending on the pattern shape with respect to the central pixel. A pattern with a perimeter value of 1 has one exposed edge on the polygon boundary; a perimeter value of 3 has three exposed edges on the polygon boundary; a perimeter value of 1.414 (square root of 2) represents a diagonal line through the pixel; a perimeter value of 2.828 has two diagonal lines through the pixel. Figure 5 shows the pattern values and their associated perimeter values.
The algorithm procedure would systematically apply the moving window at every pixel, compute the surrounding pixel pattern value, and determine the associated perimeter value. Then, for each class of pixel in the raster image, a running total of perimeter values would be accumulated. Once all pixels in the raster image are processed, the running total for each pixel class is multiplied by the length of a pixel edge, resulting in the perimeters of all the polygons (areas) in the image.
Figure 3. The 3x3 pixel matrix moving window, showing pixel position codes
Figure 4. The 256 unique pixel surround patterns
Figure 5. The 256 pattern values and their associated perimeter values
To observe the improvement in accuracy of the computed perimeters and areas, two tests were performed.
Nine vector squares, 100 units on a side, were created, each rotated by five-degree increments, from 5 to 45 degrees, to show the effects of increased aliasing on the square boundaries. Elementary geometry shows that each square has a perimeter of 400 units, and an area of 10000 square units, regardless of its orientation in vector space. The nine squares were then rasterized in the IDRISI GIS, at 5 different resolutions (75, 150, 300,600 and 1200 pixels for an image 600 units square, or 8,4,2,1 and 0.5 units/pixel respectively), and the perimeter and area of each rasterized square was computed and compared to its vector equivalent. The improved algorithm was also used to calculate the perimeters and areas of the rasterized squares. Figure 6 shows the nine squares superimposed on their rasterized versions. The results of the IDRISI perimeter calculation, including average error from the exact vector values, is shown in Figure 7. As can be seen, errors of up to 40% can result from area boundaries containing severe aliasing, such as the square rotated by 45 degrees. Figure 8 graphically shows, as rotation angle increases, aliasing increases, and corresponding error increases asymptotically to a maximum of about 38%, regardless of resolution. The results of the improved algorithm perimeter calculation, including average error from the exact vector values, are shown in Figure 9. As can be seen, error has been significantly reduced to about 8.4% for the most severely aliased square. Figure 10 graphically shows the error of the perimeter calculation, at 5 different resolutions, using the improved algorithm. A maximum occurs at about 11.5%.
Figure 6. Test 1 – Nine rotated squares superimposed on their raster equivalents
Resolution - Perimeter Values
Resolution - %Error
Figure 7. Table of IDRISI perimeter values for nine rotated squares, at five different resolutions
Figure 8. Chart of IDRISI perimeter error for nine rotated squares, at five different resolutions
Resolution - Perimeter Values using
Resolution - %Error using
Figure 9. Table of improved algorithm perimeter values for nine rotated squares, at five different resolutions
Figure 10. Chart of improved algorithm perimeter error for nine rotated squares, at five different resolutions
A vector map of China was selected due to the variety of polygon shapes and sizes within its boundaries. The map, containing 29 polygonal regions, was rasterized in IDRISI, at 5 different resolutions (80x70, 160x140, 320x280, 640x560 and 1280x1120 pixels for an image 320x280 units, or 4,2,1,0.5,0.25 units/pixel respectively), and the perimeter and area of each rasterized polygon was computed and compared to its vector equivalent. This was done to show the effect that resolution had on the perimeter/area calculation results from IDRISI. The improved algorithm was also used to calculate the perimeters and areas of the polygons. Figure 11 shows an example of the vector map superimposed on the rasterized version, at the selected resolution. The aliasing is evident on most of the polygon boundaries. The results of the IDRISI perimeter calculation, including average error from the exact vector values, are shown in Figure 12. As can be seen, errors of up to 28% can result from area boundaries containing severe aliasing. Figure 13 graphically shows, at each resolution, the perimeter error for each polygon using the IDRISI perimeter calculation. Error typically gets worse, ranging from an average of 8.5% to 25.7%, as resolution increases, due to increased aliasing. The results of the improved algorithm perimeter calculation, including average error from the exact vector values, are shown in Figure 14. Figure 15 graphically shows the error of the perimeter calculation, at 5 different resolutions, using the improved algorithm. In this case, error improves, ranging from an average of 15.9% to 1.3%, as resolution increases. This can be attributed to the diagonal compensation properties of the improved algorithm, which has a better performance as resolution increases. At higher resolutions, there are smaller sized, but more aliased or diagonal sections on raster polygonal boundaries. Thus, at the highest resolution, 1280x1120 pixels for an image that is 320x280 in vector space (0.25 units/pixel), the average error reduces from about 25.7% to about 1.3%
Figure 11. Test 2 – Rasterized map of China with vector boundary overlay (80x70 pixels)
|Resolution - Perimeter Values using IDRISI|
|Resolution - %Error using IDRISI|
|8.53||14.90||21.23||24.52||25.73||Average % Error|
Figure 12. Table of IDRISI perimeter values for the 29 polygons of China, at five different resolutions
Figure 13. Chart of IDRISI perimeter error for the 29 polygons of China, at five different resolutions
|Resolution - Perimeter Values using Improved Algorithm|
|Resolution - %Error using Improved Algorithm|
|15.90||5.68||2.12||1.90||1.32||Average % Error|
Figure 14. Table of improved algorithm perimeter values for the 29 polygons of China, at five different resolutions
Figure 15. Chart of improved algorithm perimeter error for the 29 polygons of China, at five different resolutions
It was originally thought, when this research began, that corrections for aliasing would improve the perimeter and area calculations in raster GIS. We have confirmed that this is true for the perimeter calculation, but little improvement can be made to the results of the area calculation. Errors in the area calculation are a result of the rasterization process, and the raster representation of a vector boundary. Upon examining Figure 11, one can see that the vector boundary includes parts of pixels and excludes parts of other pixels. In raster space, this ‘extra’ area and ‘excluded’ area, along the perimeters, cancel each other out. At higher resolutions, these extra and excluded areas are much smaller, since pixel size is smaller, and so this ‘cancelling out’ process is more precise. Figure 16 and Figure 17 show the error of the area calculations for the 9 squares and China map using IDRISI. As can be seen, errors are typically much less than 1% at the higher resolutions.
|Resolution - % Error using IDRISI Area Calculation|
|Rotation in Degrees||75x75||150x150||300x300||600x600||1200x1200|
|1.23||0.59||0.18||0.12||0.02||Average % Error|
Figure 16. Area calculation for the nine squares using IDRISI
|Resolution - % Error using IDRISI Area Calculation|
|4.456993||1.476211||0.694373||0.233276||0.060669||Average % Error|
Figure 17. Area calculation for the China map using IDRISI
Empirical evidence shows that, as the resolution becomes coarser, the perimeter results improve, while the area results degrade. As the resolution becomes finer, the perimeter results degrade while the area results improve. This can be attributed to, at finer resolutions, the greater number of smaller aliased pixels on diagonal edges.
The preceding research has verified that several GIS packages use simplistic methods to calculate perimeters and areas of raster regions. For perimeter, these systems count the number of pixel edges that make up the perimeter of a raster region, multiplied by the length of the side of a pixel. For area, they count the number of pixels contained in a raster region, multiplied by the area of a pixel. The results for perimeter can be in error by as much as 40%. The algorithm presented here compensates for the aliasing, and shows significant improvement in the perimeter results, by an order of magnitude. For area calculations, it was determined that no significant improvement could be made to the existing method used in most GIS systems.
It must be noted that scale of the maps was never mentioned. The shape of the polygon, the scale of the map, and the size of the pixel (or the image resolution) all affect the results of the perimeter calculation. The boundary of a geographic region exhibits a fractal like behaviour, and the resolution of the image plays an important role in the ability to resolve the complex shape of a typical vector boundary in raster space. If resolution is poor, then significant features of reality are not represented properly, and so results from the perimeter calculation degrade, regardless of the method. It was thought that scale was not an issue, since the algorithm was improving upon the ‘measuring stick’ used to calculate the perimeter, regardless of scale. For classified remotely sensed imagery, where no real ‘vector boundary’ exists since the image is acquired in its native raster format, this algorithm can provide better perimeter results by approximating the apparent vector boundaries imbedded within the image.
Future work, to improve the performance of the algorithm, can include the resolution issue, possibly using fractal techniques, and adjustments to the perimeter values associated with each 3x3 pixel matrix.
I would like to thank Dr. Doug King and Evan Seed for their valuable insight and discussions pertaining to the implementation of the algorithm. I am also indebted to Danny Patterson, Jason Fournier, Casey Trull, Hassan Eljaji, and Bruce Thomas for their assistance in performing the perimeter and area experiment using the other GIS packages. Thanks to Klaus Carter for his invaluable help in formatting this document into HTML.
Clarke, Keith, "Analytical and Computer Cartography", Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1995
Monmonier, Mark S., "Computer Assisted Cartography – Principles and Prospects", Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1982
Cromley, Robert G., "Digital Cartography", Englewood Cliffs, N.J.: Prentice-Hall, Inc., 1992