
Mitchell Morehart ^{1}, Fionn Murtagh ^{2}, and JeanLuc Starck ^{3}
^{1}Economic Research Service, USDA, 1800 M Street NW, Washington DC 20036, USA
Tel. +1 202 6945581, Fax +1 202 6945758
Email: morehart@econ.ag.gov
^{2}School of Computer Science, Queen's University of Belfast, Belfast BT7 1NN, Northern Ireland, UK
Tel. +44 1232 274620, Fax +44 1232 683890
Email: f.murtagh@qub.ac.uk
^{3}CEA/DSM/DAPNIA, 91191 GifsurYvette cedex, France
Email: jstarck@cea.fr
Geographic Information Systems (GIS) are increasingly used as tools for topographical applications and research. A comprehensive GIS is characterized by its capabilities in the areas of data processing, analysis, and post processing. This paper explores the use of the wavelet transform as a spatial analysis tool for modeling complex multivariate geographic relationships. The use of wavelets in spatial statistics is a relatively recent phenomenon that is rapidly developing. The appeal of wavelet methods stems from their ability to process noisy data with local structures and represents discontinuities such as jumps or peaks in a function. Several examples from agricultural data are used to illustrate the exploratory data analysis inherent in the wavelet transform. The resulting maps provide a convenient means of visually conveying tremendous amounts of information. The redundant à trous discrete wavelet transform is shown to aid enormously in feature detection and exploration in the succession of resolution views of the data. Analysis is carried out through the use of the MR/1 multiresolution image and data analysis package.
Through advances in computer technology, telecommunications, and spatial statistical theory, practitioners have recently gained a powerful new ability to obtain, organize, and study complex relationships between farming and its natural resource base. Applications of this new capability are broad, ranging from investigating the farmlevel benefits of precision farming to assessing the impact of conservation practices over wide geographic areas. Opportunities that arise from these developments also raise numerous questions relating to the interpretation, manipulation, and analysis of GIS data.
This paper explores the use of the wavelet transform as a spatial analysis tool for modeling complex multivariate geographic relationships. The wavelet transform is shown to facilitate several desirable capabilities as multiresolution support, smoothing, and noise removal, which enhance the construction and postprocessing of geographic data representations. Multiresolution support (Starck, Murtagh, and Bijaoui 1995) involves studying an image through information contained at various resolution levels. The statistical concept of smoothing is generally associated with nonparametric regression methods used to uncover a relationship within data, without strong a priori restrictions on its form. There are many examples of both liner and nonlinear applications of wavelet regression to this type of problem (see for example Donoho 1993, Donoho and Johnstone 1992, Walter 1992, DeVore and Lucier 1992, Kerkyacharian and Picard 1992, and Kerkyacharian and Picard 1993). We can advance two reasons why it is effective to work on tasks such as denoising or filtering in wavelet transform space. First, the wavelet transform is often found to be relatively sparse. Many coefficients are zero or small in value. Second, by virtue of the multiresolution decomposition we can base our filtering on scalerelated components in our data. Wavelet methods are appealing because of their ability to process noisy data with local structures and represent discontinuities such as jumps or peaks in a function. This is an important consideration for estimating geographicallyreferenced multivariate surfaces where a high degree of spatial inhomogeneity is expected.
The next section presents an overview of wavelet analysis with emphasis on application to geographic data analysis. A discussion of the data source and construction of a geographic information system follow this. The third section provides an empirical framework for analysis and the underlying methodology. In the fourth section, spatial distribution problems for two agricultural indicators are used to illustrate multiresolution spatial analysis. The first application involves determining those areas of the country with the highest corn yield. The second example identifies specific areas of the country where farms are most dependent on wheat as a source of income. Concluding remarks focus on the implications of these techniques for broader GIS application and areas for further research.
Wavelets are inherently tied to the concept of multiresolution analysis. Using wavelets, a function can be depicted as a course overall shape, plus detail that ranges from broad to narrow. This ability to zoom in or zoom out that is characteristic of multiresolution analysis permits an image to be interpreted as a sum of details that appear at different resolutions; furthermore, each scale of resolution may pick up different types of structure in the image.
The à trous Discrete Wavelet Transform (DWT) (Shensa 1992, Holschneider et al. 1989, Starck, Murtagh, and Bijaoui 1998) differs from the standard DWT in that decimation or subsampling is not carried out at successive resolution levels, in spite of the fact that the information carried by these resolution levels decreases. In fact, detail coefficients remain as numerous as the input data at these successive levels. This has great benefits for cartographic representation, as shown below. This sort of redundancy also aids in feature detection and exploration in the succession of resolution views of the data, which are provided by the detail coefficients. For given data x, a relationship based on the unknown function m using the à trous DWT gives us the following additive decomposition:
x = m + e = d_{1} + d_{2} + ... + d_{J} + c_{J}, 
(1) 
where the residuals e_{i} are assumed i.i.d. with mean zero and variance s^{2}.
Each term here is a vector of length p, or an image of dimensions n × p, or even a data cube. The sum is simply carried out elementwise, or pixelwise. The additive decomposition is one where the components are the succession of detail coefficients given by d_{j} which are dependent on the resolution level, plus the underlying smooth version of the data which is given by c_{J}.
In the continuous wavelet transform, CWT, the detail coefficients at any given resolution scale are obtained by convolving (implying translation ) the given data with continuously dilated versions of the wavelet function or mother wavelet . The term wavelet is explained by the fact that some wavelet functions have the appearance of a central bump with one or two diminishing ripples, and others are more like a tidal wave. At any rate, leaving aside such informal description, wavelet functions are nearly always of compact support, and, in order to guarantee an exact inverse transform, produce wavelet coefficients of zero mean at each level.
As shown above, the dilated versions of a wavelet function are used to produce d_{1}, d_{2},...d_{J}. A scaling function is used to produce c_{J}. The scaling function has the effect of smoothing the data. The à trous DWT has the nice property that dilated versions of a smoothing function are used to produce a set of successively smoothed versions of the input data, c_{1}, c_{2}, ... , c_{J} and then the wavelet coefficients can be defined straight away from these just by subtraction: d_{1} = x  c_{1}, d_{2} = c_{1} c_{2}, ... , d_{J} = c_{J1}  c_{J}. The scaling function commonly used is a B_{3} spline, known for its good interpolation properties that extend to irregularly sampled data. (The authors note that the data studied are irregularly sampled: the discrete convolution with the scaling function  whose shape and support determine the scalerelated resolution  provide mappings of the data onto a regular grid.)
Data dependent noise modeling may be based on the noise model posed for the input data, such as (in the additive noise model case) x = { X_{i} +e_{i} }_{i = 1}^{n} where e need not be i.i.d. Based on such a noise model, significance levels can be determined at successive resolution levels using the classical hypothesis testing approach.
The MR/1 multiresolution image and data analysis package (MR/1 1998, Starck, Murtagh, and Bijaoui 1998) was used in this work. This package handles (raster) image and point pattern data. An extensive range of wavelet and other multiscale transforms are supported (à trous, Mallat, Feauveau, Haar, and others, in various pyramidal and nonpyramidal versions; Laplacian, median and morphological transforms are also available). Data dependent noise models include Gaussian and Poisson, additive and multiplicative, stationary and nonstationary. Supported are such tasks as visualization, filtering, deconvolution, compression, feature and structure finding, and many other operations.
GIS are increasingly used as tools for topographical applications and research. A comprehensive GIS is characterized by its capabilities in the areas of: (1) data processing, (2) analysis, and (3) postprocessing. Spatial data are fundamental to the construction of any GIS and involve the integration or collection of data comprised of geographic identification and attribute information.
Spatial analysis of relationships between farming and its natural resource base are carried out using data collected from USDA's Agricultural Resource Management Study (ARMS). The ARMS is a personally enumerated survey, conducted since 1984 by the National Agricultural Statistics Service (NASS) and the Economic Research Service (ERS) of the U.S. Department of Agriculture. The ARMS is a probabilitybased multi frame, stratified survey that uses multiple questionnaire versions to collect information on farm production expenses, capital purchases, income, production practices, and other farm operating characteristics.
Local geographic references for sample points are often omitted from surveys designed to collect economic and financial information. In this instance, some form of data integration is required to associate these attributes with geographic references that are more specific than the usual political boundaries such as State or county. The simplest form of data integration involves attaching geographic references to sample points from survey data. More sophisticated forms of integration can be achieved by overlaying information from one GIS framework to another. In any event, the method of integration involves some assumption about the underlying continuity of sample data and the inherent level of statistical reliability established by the sampling scheme.
The most common geographic reference is the longitude/latitude coordinate system. Location also may be annotated by such systems as ZIP codes or highway mile markers. Any variable that can be located spatially can be fed into a GIS. For this study, geographic references were obtained for each of the 3,100 county population centroids. The assignment of latitude/longitude coordinates for each sample observations is based on the FIPS code. Not all counties contained samples due to survey coverage limitations or absence of agricultural activity. The degree of resolution is much more intense in the eastern half of the contiguous U.S. since the size and number of counties is much greater than in western states (Figure 1).
Figure 1: Location of country centroids with survey responses.
A final consideration for creating GIS data is map orientation or projection. A projection is a mathematical means of transferring information from the Earth's 3D curved surface to a 2D medium. Different projections are used for different types of maps because each projection differs in the way they handle distance, direction, shape and area. For example, a projection that accurately represents the shapes of the continents will distort their relative sizes. The Universal Transverse Mercator (UTM) projection is one of the most commonly used projections. Visually, it preserves shapes for small areas, but gives a distorted representation of the contiguous United States that is not easily recognized when compared with projections used by most atlas and road map presentations.
The projection selected for this analysis was the AlbersEqualArea (AEA). The AEA Conic projection uses two standard parallels to reduce some of the distortion produced when only one standard parallel is used. Although neither shape nor linear scale are truly correct, the distortion of these properties is minimized in the region between the standard parallels. The country centroid reference data was originally specified in decimal degrees. The latitude/longitude coordinate system georeferences were converted to AEA projected meter georeferences. For example, the population centroid of Carrol County, MO (FIPS 29033) is represented as 39.404 latitude and 93.498 longitude in decimal degrees and converts to 1,822,938.285 meters from 23 degrees latitude and 213,456.116 meters from 96 degrees longitude in the AEA projection.
Another component of data preparation for this study is binning. Binning methods originated with development of the histogram as a means to graphically display univariate data. More recently, binning has been used to reduce computational difficulties associated with applying nonparametric regression techniques for smoothing large data sets (Scott 1992, Härdle and Scott 1992, and Wand and Jones 1994). In this context, binning involves reducing the original data to a relatively small number of estimates at equallyspaced intervals. In practice this treatment of sample data has been used to support various kerneltype estimators such as the Average Shifted Histogram (Scott and Whittaker 1996), bivariate local linear regression (Werthenbach and Herrmann 1998), and Fast Fourier Transform (FFT) computation methods (Fan and Marron 1994). Donoho et al. 1995 provides examples for density estimation based on wavelet thresholding, while Antoniadis and Phan 1995 incorporated binning directly into a nonparametric wavelet regression estimator.
In addition to computational efficiencies, binning offers other important benefits to practical problems associated with spatial modeling of complex survey data. First, the binning algorithm developed by Scott allows the complex, probabilitybased sample characteristics to be incorporated in the analysis (Scott and Whittaker 1996). The second advantage of this approach is that the selection of bin width or grid spacing allows the practitioner some flexibility in controlling the degree of resolution given the statistical reliability constraints of the sample data. Results by Hall and Wand 1996 for kernel density estimation suggested that the accuracy of binning and the choice of grid size are dependent on the relative smoothness of the underlying density and sample size. Finally, as noted by Antoniadis and Pham 1995, binning can be used to compensate for several restrictions that pertain to widely used wavelet estimators such as fixed equidistant design, homoscedasticity, and sample sizes that are a power of two.
In the context of the à trous DWT introduced in the previous section, binning arises naturally as the discrete convolution of a scaling function with the given input data. We use, below, a heteroscedastic noise model with this wavelet transform (shown below).
Having established a GIS framework from which to examine multivariate relationships, the next step involves implementation of the wavelet transform in a 2D setting. The major goal of this process is to produce a sequence of versions of the binned approximation of the original data at more and more coarse resolutions separating noise from salient features in the data.
Extension of the wavelet transform to the 2D problems presented by the geographic data system described in the previous section is based on estimating the unknown mean response function:
^ 
(x,y) = E(ZX = x,Y = y) 
(2) 
where (x,y) represents the center of one of the geographically referenced bins, and Z represents the variable of interest. The preconditioned Z matrix is constructed by calculating the weighted average value within each of the bins as
^ 

= 
n 
w_{i}Z_{i} 

(3) 




n 
wi 

where n is the number of Z_{i}'s in B_{j,l1,l2} and w_{i} represents the corresponding sampling weights for each observation. In many cases, the estimate will be mathematically undefined or zero. The first situation occurs when there are no sample observations in the bin, while the latter reflects no positive response for the variable of interest.
The redundant à trous method has a range of useful properties for the handling of sparse spatial data. Compared to other wavelet transforms, we may note the following:
The handling of data boundaries in the DWT always requires attention. Implementations of DWT methods that use decimation usually require that the input data dimensions be integer powers of two; or else that the input data be padded to be of such prespecified dimensions. In the case of the redundant à trous method, we do not have such a constraint. In practice, we usually use reflection in the boundary (the "virtual'' value at position n+1 is taken as the known value at n, position n+2 is given by n1, and so on), which makes the convolutions with the scaling function welldefined at all stages.
Stationary transform
The redundancy involved in the succession of applications of the scaling function, yielding c_{1}, c_{2}, ... c_{J}, means that we ought not take each translated value into account when constructing the following resolution level: in fact, a sampling scheme based on gaps of {2^{i}}^{J1}_{i = 0} unused values between values used in the convolution does the job admirably. Such a sampling scheme is compatible with what would have been the case, had we decimated. (Note that decimation can be carried out with the à trous wavelet transform, if so desired.) This works well all the more so since we are using a scaling function with excellent interpolation properties. It is for this reason that the à trous ("with holes'' or gaps) method bears this name.
The à trous DWT is of linear computational complexity, i.e., O(n) for n input values (and is thus more efficient than, e.g. the FFT).
Due to redundancy, the à trous transform provides intuitively understandable visualization. The noise, or other, filtering of subsampled data can be entirely avoided; as noted in section 2, we have at our disposal a wide range of powerful data dependent noise models.
One of the unique features of wavelet analysis is that no matter what the function of interest, a number of alternative views that represent detail in the data are presented graphically. Visualization of the wavelet transform provides knowledge about the structure of the data and presents a convenient forum for conducting waveletbased diagnostics prior to reconstruction. The multiresolution aspect of the transform also permits the detection and parsing of objects within an image.
The figures below show the wavelet transform of the data as spatially mapped over the surface of the continental United States. These provide corn yield and wheat dependence "topographies" at varying resolution scale. Our original data, in both cases, is the sum of the 4 scales shown. The last scale is the most smoothed version retained of the data. The wavelet coefficients depicted in figures 2 and 3 represent only positive values.
Figure 2: From top to bottom: Corn yield data; detail or wavelet coefficients (nonnegative values shown) at successive resolution levels, d_{1}, d_{2}, d_{3}, c_{3}
Figure 3: From top to bottom: Economic dependence on wheat data; detail or wavelet coefficients (nonnegative values shown) at successive resolution levels, d_{1}, d_{2},d_{3}, c_{3}.
Multiresolution representation of GIS structures facilitates the extraction of information from the data. Typically, image or data filtering is focused on the highfrequency (rapidlyvarying) components of the image or data structure as reflected in the detail wavelet coefficients. In this paper, we turn our attention to information revealed in the smooth wavelet transform. This provides a broad representation of the variable of interest and avoids issues associated with noise in the data that is contained in the detail wavelet transform planes.
The symmetric B_{3} spline filter has identified two primary clusters in the smooth corn yield wavelet plane. Areas of high corn yield, defined as bushels of corn produced per harvested acre, seem to be geographically concentrated in the northwest corner of Iowa extending to southern Minnesota and the eastern parts of South Dakota and Nebraska. A second cluster is identified in Indiana, Illinois, and western Ohio. These states collectively accounted for 70 percent of 1996's record total corn production (USDA, NASS 1998). A reconstructed imaged based only on the smooth representation of the data contained in the c_{3} resolution levels would amount to a linear filtering of the information represented by the detail coefficients. This, by itself, provides a broad interpretation of the data in that those areas of the country where corn production is geographically concentrated is identified. Some interesting artifacts also are revealed in the detail resolution levels. Because of the impact of irrigation, varietal differences, and other natural resource characteristics, the highest average corn yields tend to occur in isolated areas of the country. For example, in 1996 the highest state average corn yields were for Washington, New Mexico, Arizona, California, and Oregon. Yet, these states contributed less than one percent of total production. Some of these occurrences are faintly visible in the detail coefficients. Other instances of high yields appear outside the region identified in the smooth resolution.
It is easy to see that economic dependence on wheat is invariant to state boundaries. Economic dependence on wheat, defined as the share of total farm production represented by wheat, identifies the sensitivity of the farm or farm household to changes in the price of wheat. The greater the economic dependence, the more significant the impact on farm finances from price changes. Four clusters of varying size and significance are identified in the smooth wavelet transform. The largest cluster is located in the western half of Kansas and extends south into Oklahoma and Texas, and north into portions of Colorado and Nebraska. The second largest cluster covers much of North Dakota extending into the northeast corner of Montana. A third cluster is identified in the northern half of Montana. The fourth cluster straddles the northcentral part of Oregon and southern Washington.
The information represented in the smooth wavelet transform plane shows those areas of the country where impacts of changes in the prices of wheat would have the most significant impact on farm income. As was demonstrated for the corn yield analysis, other more isolated areas where there are farms that depend on wheat for a large portion of their income, are identified in the detail resolution levels. In contrast, with that example, there seems to be a greater number of these areas identified for economic dependence on wheat. They also tend to be more spread out, geographically.
There are various methods to formalize interpretation of geographically referenced data in the multiresolution representation. The content of a wavelet plane can be summarized through statistics related to the number of objects present, their maximum and average size, and other easily determined characteristics. An efficient clustersignificance testing procedure (Slezak, et al. 1998) can be applied to more directly model significant structures revealed in the mutiresolution representation of the data. In addition to the detection and extraction of information, there are various ways to analyze empirical wavelet coefficients that enhance this type of cluster analysis or pattern recognition. Graphic displays of wavelet coefficients provide good diagnostics for data analysis since the data or function of interest is completely captured in the wavelet coefficients. Image querying, multiresolution editing, and multiresolution plots provide alternative means of visually exploring complex relationships in the data. Other statistical methods or approaches to graphical data analysis, such as box plots and QQ plots, can also be applied to wavelet coefficients.
A foundation for the application of wavelets to spatial analysis of complex multivariate data has been put forth. The exploratory data analysis inherent in the wavelet transformation and resulting maps provide a means of visually conveying tremendous amounts of information. This portrayal of the complex relationship between economic attributes and demographic characteristics makes an important contribution to understanding the spatial implications of government agricultural policy.
A more rigorous analysis of the data, would involve detection and filtering of noise. Data dependent noise modeling may be based on the noise model posed for the input data with significance levels determined at successive resolution levels using the classical hypothesis testing approach. One of the problems with wavelet nonparametric regression smoothing of this type is the flexibility offered by the wide array of choices available in terms of parameters to be specified and estimation methods. In addition to the choice of thresholding scheme, other important factors are: the analysis and synthesis filters, determination of the threshold, and the number of levels. This particular example also is complicated by implications for i.i.d. assumptions when dealing with complex sample data. While settlement of these issues is beyond the scope of this paper, their consideration will further waveletbased estimation of geographicallyreferenced multivariate surfaces.
The spatial analysis developed and demonstrated in this paper could be augmented by further integration of demographic characteristics and resourcebased attributes. These types of applications might examine how geographic dependencies change over time and the extent to which natural resource characteristics influence these changes. Implementation would involve either extending the nonparametric model to accommodate higher dimensions, or a more complex postprocessing activity focused on merging geographic representations.
In comparing density estimation methods (an adaptive kernel density estimator and the maximum penalized likelihood method) and the à trous wavelet transform using extensive 1D simulations (Fadda, et al., 1998) finds that the wavelet transform performs well when there are local gaps in the data distribution, and when small substructures are superimposed on larger structures. (They also argue for the robustness of kernel methods.) To this we would add the wide range of possibilities for exploratory visualization, exemplified in this paper, when using the wavelet transform. Finally, we would stress the importance of a multiscale transform approach whenever our data is taken as a set of superimposed or compounded resolution scalerelated phenomena.
Antoniadis, A. and D.T. Pham, 1995. Wavelet regression for random or irregular design. Technical Report, University of Grenoble.
DeVore, R.A. and B.J. Lucier, 1992. Fast wavelet techniques for near optimal processing. In Proceedings of the IEEE Military Communications Conference, pp. 48.3.148.3.7.
Donoho, D.L., 1993. Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data. In Proceedings of Symposia in Applied Mathematics, 47, pp. 173205.
Donoho, D.L. and I.M. Johnstone, 1992. Nonlinear solution for linear inverse problems by waveletvaguelet decomposition. Technical Report 403, Department of Statistics, Stanford University, Stanford, CA.
Donoho, D.L., I.M. Johnstone, G. Kerkyacharian, and D. Picard, 1995. Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society, Series B, 57, pp. 301369.
Fadda, D., E. Slezak, and A. Bijaoui, 1998. Density estimation with nonparametric methods. Astronomy and Astrophysics Supplement Series, 127, pp. 335352.
Fan, J., and S. Marron, 1994. Fast implementations of nonparametric curve estimators. Journal of Computational and Graphical Statistics, 3, pp. 3556.
Hall, P. and M. Wand, 1996. On the accuracy of binned kernel density estimators. Journal of Multivariate Analysis, 56, pp. 165184.
Härdle, W., and D. Scott, 1992. Smoothing by weighted averaging of shifted points. Computational Statistics, 7, pp. 97128.
Holschneider, M., KronlandMartinet, R., Morlet, J., and Tchamitchian, Ph., 1989. A realtime algorithm for signal analysis with the help of the wavelet transform. In J.M. Combes, A. Grossman and Ph. Tchamitchian, Eds., Wavelets: TimeFrequency Methods and Phase Space (Berlin: SpringerVerlag), pp. 286297.
Kerkyacharian, G. and Piccard D., 1992. Density estimation in Besov spaces. Statistics and Probability Letters, 13, pp. 1424.
Kerkyacharian, G. and Picard, D., 1993. Density estimation by kernel and wavelet methods: Optimality of Besov spaces. Statistics and Probability Letters, 18, 327336. Multi Resolutions Ltd., MR/1: The Multiresolution Analysis Software, User Manual, Version 1.0, p. 100. (http://www.visitweb.com/multires or email multires@compuserve.com).
Scott, D.W., 1992. Multivariate Density Estimation: Theory, Practice, and Visualization (New York, NY: Wiley).
Scott, D.W. and Whittaker, G., 1996. Multivariate applications of the ASH in regression. Communications in Statistics A: Theory and Methods, 25, pp. 2,5212,530.
Shensa, M.J., 1992. Discrete Wavelet Transforms: Wedding the à trous and Mallat algorithms. IEEE Transactions on Signal Processing, 40, pp. 2,4642,482.
Slezak, E., A. Bijaoui, C. Balkowski, and P. Fonatanelli, 1998. Galaxy counts in the coma supercluster field: automatic image detection and classification. Astronomy and Astrophyysics Supplement Series, 74, pp. 83106.
Starck, J.L., Murtagh, F., and Bijaoui, A., 1995 , Multiresolution support applied to image filtering and decomposition. Graphical Models and Image Processing, 57, pp. 420431.
Starck, J.L., Murtagh, F., and Bijaoui, A., 1998. Image Processing and Data Analysis: The Multiscale Approach (Cambridge: Cambridge University Press). U.S. Department of Agriculture, 1998, Field Crop, Final Estimates 199297, Statistical Bulletin No. 947, National Agricultural Statistics Service, Washington, D.C.
Walter, G.G., 1992. Approximation of the delta function by wavelets. Journal of Approximation Theory, 71, pp. 329343.
Wand, M.P. and Jones, M.C., 1994. Kernel Smoothing (London: Chapman & Hall).
Werthenbach, C. and Herrmann, E., 1998. A fast and stable updating algorithm for bivariate nonparametric curve estimation. Journal of Computational and Graphical Statistics, 7,(1), pp. 6176.