|
|
|
Gail Kucera
Mercator Systems Ltd., 2936 Phyllis Street, Victoria, BC V8N 1Z1, Canada
E-mail: Kucera@islandnet.com
This paper discusses pluralism in spatial information systems. Pluralism is required when different versions of the same information are available to describe a geographic area or feature. The discussion draws on experiences in three recent development projects that, combined, represent a broad spectrum of requirements and techniques for managing pluralism. The first development resulted in an operational system to manage 100+ years of land tenure records for the public lands that encompass over 90% of British Columbia. The second development involves ongoing R&D funded by the US Army Topographic Engineering Center, and will result in a pluralistic spatial data warehouse to manage multi-source, multi-resolution information with linked "same-as" feature networks to support spatial drill-down/up, and generation of a "best map" at a requested scale. The third development is a spatial data warehouse for a national forest information system for the Canadian Forest Service.
Many spatial systems require more than a simple monolithic representation of information. The need to extend an information system to model a more pluralistic world can occur because the available data or the application domain itself is pluralistic. Three prime examples of pluralistic requirements follow.
Global coverage of geographic data is relatively small-scale and comparatively stale. Larger-scale and fresher data are sometimes available, but they may be missing the spatial continuity, or even some features, available at the smaller scale. Ideally, in areas of duplicate data coverage, the pluralistic data can can be integrated into a monolithic "best map" by choosing the "best" of each version of a given feature or geometric description. The definition of "best" may change for different purposes, however. For example, the most current information may be Best for choosing a route on a transportation network, the highest-resolution information may be Best for finding the shortest route on the transportation network, and the richest thematic attribution may be Best during the rainy season in an area with poorly developed roads. By storing all renditions of information in a pluralistic database, an application can derive the Best monolithic representation using specialized query tools. A later section on query requirements provides further insight into the concept of a Best Map.
Data warehouses often manage information collected by different organizations using different collection methods and categorical schemes. Duplicate area coverage exists, but information can be difficult to compare because the sampling or measurement methods differ. In some cases, different data sources obviously disagree and provide a completely different picture of an situation. A monolithic system would need to reconcile all such inconsistencies before data are loaded; a pluralistic system loads all data and provides application tools to explore data and possibly to reconcile discrepancies based on all information sources available up to the moment of query. Two conflicting sources may be unreconcilable, but a third that arrives later may solve the puzzle, without devaluing aspects of the information contained in the prior two.
Multi-temporal applications require some means of managing versions of information as it changes over time. The two critical considerations that drive a technical approach are whether the entities that change are persistent or not; and at what level of detail change is described. Persistent entities remain recognizable as the same thing, regardless of change, and can be queried to examine their metamorphosis. Non-persistent entities have a period of effectiveness bracketed by birth and a death, but no changes occur during that period. In either case, the database accumulates multiple representations of an area or its entities, only some of which are valid at a given moment.
This paper defines the concept of pluralistic information management, then describes particular cases where pluralistic techniques are useful. Some data modelling techniques for managing pluralism are discussed, and special query requirements are illustrated.
A pluralistic system permits data to be heterogeneous in terms of source, resolution, area coverage, currency, categorical description, and quality. In general, data are stored "as-is", as received from the provider, without attempting to integrate schemas, generalize, resolve spatial discontinuity, or otherwise encourage consistency in representation. One exception to this rule is that it is often advisable to convert all coordinates to the same spatial reference, and all dates to the same temporal reference.
In contrast, a monolithic database reconciles any pluralism as part of the update process. New information that differs from existing information is evaluated, and if "better," it replaces the "worse" information. The monolithic database holds a single "best map" compiled by integrating available information.
In practice, one feature in a pluralistic system may have multiple representations acquired from various sources. Depending on the pluralistic techniques used, the multiple representations may be linked. Or the multiple representations may be independent, requiring interpretation to determine their relationship. In either case, information may disagree, and disagreements are not reconciled. Application software provides various tools for working with the pluralistic data. Evidently, meta-content and special queries play a powerful role in the usability of a pluralistic system.
Versioning techniques for temporal information, described in some detail in the spatial information management literature, also are important tools when managing other forms of pluralistic information. The following are some examples.
The effective dates used to describe currency in temporal data can be extended to include other metacontent that describes resolution, accuracy, schema, collection methods, or other factors that make data more or less Good in the context of a Best Map. A purely temporal query might ask for all data whose effective dates bracket the moment in question; a more general pluralistic query would ask for all data whose resolution, accuracy, or other descriptors fell in an acceptable range for the purpose at hand. A strong metacontent strategy is obviously crucial to a pluralistic system. Figure 1 illustrates a basic metacontent model.

Figure 1. A basic metacontent model that describes the quality and lineage of a feature. This information can be used to select among pluralistic versions of a feature.
Persistence is modelled in the temporal domain by linking different versions of a feature using a common ID or other connecting information. Land tenure information for British Columbia has been modelled to show persistence in a land disposition, despite changes in the lease document, terms, parcel extent, client, land use, etc. Similarly, a persistent identity of one feature across its multiple representations can be described by explicit links that express the concept of "same as". Links can be between entire features if they are discrete, or between portions of line or area features. Figure 2 illustrates a scenario of two linked line features, as is being implemented in a development project for the US Army Topographic Engineering Center.

Figure 2. Example of linear feature links. A feature can be fully linked to one other feature, fully linked to several other features, partly linked to one or more features, and other variations.
Note that links are between portions of features, not entire features. A feature-linking service that can take two datasets and link the geometric elements in each has been developed by Intergraph Corporation in a related part of this same project.
A link is essentially a feature type that is stored in the warehouse. The link for a line feature is composed of the identifiers of the two features linked, four points (the beginning and end points of the portions of the features linked), and descriptors of the certainty of the link and parameters involving its detection.
A mapping between the various categories and domains used by different sources or epochs is a necessary utility for a user-friendly pluralistic system. Such a utility would permit a user to request "Hard-surface roads" and receive data that is variously called Paved Roads, Primary Auto Routes, Asphalt Highways, etc. At minimum, users should be able to browse an on-line data dictionary to choose features of different names and categorical breakdowns, but all meeting the basic information requirements. Even better, a system could store cross-references that link similar feature descriptions in the various schemas. The challenge for on-line schema mapping is as much in the semantics as in the technology.
It is a practicality to store monolithic snapshots that are created from the pluralistic data at reasonable periods and resolutions. One purpose of such views is performance-related: it improves response time for casual browsers by avoiding the need to navigate pluralistic data. It also supports drilling up and down through multiple resolutions by making intermediate monolithic views available through which pluralistic data are accessed. A second purpose of the monolithic view is to allow organizations to analyze the pluralistic data at leisure, then present users with a Best Map that comprises the Authorized Best Guess at a reconciled representation.
To work with fully pluralistic database means the following classes of queries must be supported.
Figure 3 illustrates the Best Map query. The Best Map contains the "best" data from a heterogeneous collection of features loaded from many different sources available in the area of interest. In this example, the area is covered by two sources, one of a higher resolution than the other. Note that some of the information available on the 1:250,000 source is not available on the 1:50,000 source. Thus, the Best Map includes features (or segments of features) from both sources.

Figure 3. Example of a Best Map query.
From a data perspective, a Best Map has many geometric elements. Each element is the "best" version of that spatial component available in the database. Each element is derived from exactly one stored feature, although a given feature can have more than one of its geometric elements be included in the same Best Map. Each geometric element can have more than one attribute description: one for each alternative representation of that piece of geometry. The default attribute description is the one from the same source as the "best" geometric element, but users can elect to view alternative attribute descriptions via feature-based drilling (see below). These relationships are achieved via pre-processing, to link the geometric portions of features that are recognizably the "same"; and via real-time composition of the Best Map via database query.
Figure 4 illustrates the composition of a Best Map from linked line features. The merge process must retrieve the unlinked features in the area of interest, since those are the best representation. For the linked features, it must discard any segments that are linked to "better" segments via clipping, then splice the endpoints together. The Best Map query function is to be developed for the TEC project in 1999.

Figure 4. Merging linked features from multiple sources to get a Best Map. The final step of achieving network connectivity may be achieved by inserting a connecting segment (tagged as such) rather than by moving either endpoint.
Figure 5 illustrates an area-based drill-down. The user works initially with a generalized (small-scale) map to identify an area of interest, then "drills down" to acquire more detail. The user might choose to drill back up again to gain the context of a larger area, then down again to examine a different set of features in more detail. Note that each drilling operation retrieves a new "best map", with a heterogeneous set of features suited to the resolution desired.
Figure 5. Example of area-based drilling. The example shows drill-down; users also can drill up using the same general procedures.
Figure 6 illustrates a feature-based drilling operation, where the user can review alternate representations of a feature, benefiting from all sources available for a particular feature. For feature-based drilling, the user selects a feature, then requests to see its geometry and attributes from selected sources in turn.

Figure 6. Example of a feature-based drilling query.
When pluralism is multi-temporal, a user might review Previous and Next versions of a feature. When pluralism is due to multiple source representations, a user might alternate between an area-based and a feature-based drill. Area-based drilling results in a Best Map that displays a single geometric representation for each feature, regardless of how many different geometric representations are available via links. In contrast, feature-based drilling permits a user to cycle through all the versions of a feature available in the warehouse, both attribute and geometry. Thus, to perform a feature-based drill following an area-based drill means that it must be possible to access other versions of a feature from the Best Map produced by the feature-based drill.
A pluralistic spatial information system requires the use of various data modeling techniques, and the availability of data reconciliation applications and queries. Most such techniques are only in their infancy and must be developed with care to ensure appropriate use of the data. The reward is a more powerful and expressive information system that does not pre-suppose a reality that may come into question when the next data source arrives. It is assumed that a pluralistic information system will be continually enhanced with new data of improved freshness, resolution, accuracy, and thematic correctness. The pluralistic approach alleviates the need to reconcile schema, boundary, and thematic discrepancies. At the discretion of the database manager, alien datasets can be loaded into the warehouse and used with minimal pre-processing, in the interests of having rapid access to the best possible data at a given moment.