The system architecture for the
Apoala project consists of the following three
interrelated modules: a temporal GIS data management
system known as Tempest, a data mining software package
known as AutoClass, and a set of Geographic Visualization
(geovisualization) tools developed in IBM Data Explorer and Tcl/TK.
The goal of the database management system is quick
retrieval and updating of very large data sets using
parallel processing and other high-performance computing
techniques. Data mining is generally the use of
statistical techniques supporting unsupervised
classification of very large data sets in an attempt to
find patterns that would not otherwise be discernable.
Geographic visualization techniques, in contrast to data
mining, display data in a way that allows a domain expert
to take advantage of the nature of human visual processes
to find patterns.
The challenge that we are
addressing is moving data from one module to another in a
way that is simple and invisible to the user. For
instance, a synoptic climatologist may examine data from
daily weather station readings throughout the Susquehanna
drainage basin when looking for patterns in
high-intensity rainfall events. One possible approach to
this problem would be to use data mining software to
classify the data. In order to run this classification,
the data would need to be preprocesses to select the
appropriate subset. Once the classification has been run
it can be visualized in order to understand what
variables play a significant role in the classification,
and where and when cases classified into a certain group
are likely to occur. This may lead to the development of
hypotheses that can be tested through the collection of
other data, or can be explored through further
examination of existing data.
In the example given above there
is a very simple flow of data from the database to the
data mining software, and finally to the visualization
module. In reality the flow may be much more complex. A
visualization of the original data set may lead to the
extraction of a subset of data that is then mined. The
mining process, which is iterative, may be interactively
visualized, and the results may be stored as a final
product or facilitate further visualization, database
queries, or mining. The challenge, then, is creating a
stable environment within which this complex interaction
is, for the most part, invisible to the user so he or she
may focus on interpreting the data.
The database management system
that we are using in Apoala is based on the triad
database model developed and implemented in previous
research as a Temporal GIS system called
Tempest provides extremely fast
access to data using a number of tools including linear
and cyclical queries.
It also provides a number of
simple summary statistics for spatial and/or temporal
subsets of the data.