|
The project leveraged GeoVISTA Studio as an environment
for building and testing software applications focusing on exploratory
spatial data analysis applied to cancer and related risk factor
data. A brief summary of software advances follows:
Visual
Inquiry Toolkit (VIT): The Visual Inquiry Toolkit is a software
application built with existing and new visual and analytical components.
Its distinguishing feature (in relation to most previous visual
analytics tools) is a capacity to combine advanced visual, computational,
and statistical methods to support both data-driven and model-driven
analyses, interactively. It supports analysis to identify and analyze
spatial, temporal, and high-dimensional clusters, patterns and relationships,
at multiple scales. The VIT employs an array of data mining methods
(e.g., spatially-constrained hierarchical clustering methods, self-organizing
map) and statistical methods (e.g., Moran's I, connections to SaTScan),
integrating them together via highly-interactive, dynamically-linked
visualization methods (e.g., reorderable matrix, parallel coordinate
plots) and graphic user interfaces. The VIT can feed the output
from one method (e.g., a model-driven method) to the other methods
(e.g., data-driven ones), facilitating quick hypothesis generation
and verification. The VIT also supports saving intermediate findings
during data analysis, thus enabling incremental analysis through
multiple sessions (i.e., weeks, months). The VIT is a component-based
framework that can be customized to support studies for different
research topics and domains.
The
GeoViz Toolkit: Like the Visual Inquiry Toolkit, the GeoViz
Toolkit is partially derived from GeoVISTA Studio code. A core goal
in developing GeoViz was to make it easier for end users to begin
analyzing their data. It accomplishes this through a flexible user
interface that allows non-programmers to assemble their own custom
applications quickly and by supporting standard data formats to
make data input easier. The GeoViz Toolkit includes versions of
the RadViz, PROCLUDE, and Moran's I tools described below along
with a wide range of mapping, statistical graphics, and other analytical
methods.
RadViz: The RadViz component in GeoVISTA Studio is an interactive
tool for visualizing highly multivariate geographical datasets.
It is based on the RADVIZ visualization method (introduced by Ankerst
and colleagues {Ankerst, 1998 #10554}) that maps data points in
a high dimensional data space to a two dimensional data space. The
addition of RadViz to GeoVista Studio enhances Studio's capability
of multivariate data analysis and visualization. The RadViz tool
is implemented in a way that facilitates interactive exploration
of multivariate geographical datasets. Its integration into the
GeoViz Toolkit makes this novel analytical method easy to link with
other tools and more accessible to users.
PROCLUDE: PROCLUDE (PROgrams for CLUster DEtection) is a
toolkit application containing multiple spatial cluster detection
routines. As part of the grant reported here, a new genetic algorithm-based
cluster detection method was developed and implemented in PROCLUDE.
PROCLUDE also contains new Java-based implementations of three established
cluster detection methods: Openshaw's Geographical Analysis Machine
(GAM), case point centered searching (proposed by Besag and Newell),
and randomized GAM (proposed by Fotheringham and Zhan). In a related
effort, an approach to using genetic algorithms to configure the
"flock of boids (bird objects)" method of cluster detection
has been implemented and tested. As noted above, PROCLUDE has been
integrated into the GeoViz Toolkit, making it possible to explore
its output using the full range of GeoViz tools.
G-EX
Portal: The Geovisual EXplication (G-EX) Portal is a web-based
geocollaboration and resource sharing tool that supports the use
of the aforementioned ESDA and geovisualization tools and the interpretation
of results derived from these tools (see figures above). Prior to
analysis, users can access the G-EX Portal to download software
and related learning materials (e.g., training manuals, multimedia
tutorials, example problems). Following analysis, users can then
upload artifacts generated during analysis sessions and collaborate
about them asynchronously.
GeoDa:
GeoDa is a suite of exploratory spatial data analysis (ESDA) tools
developed at the University of Illinois by Co-PI Anselin, primarily
as part of the NSF-funded Center for Spatially Integrated Social
Science. In this NCI project, we have converted spatial analytical
methods from GeoDa into open-source Java components for integration
with other open-source components within GeoVISTA Studio. Converted
methods include ones to: construct contiguity information from geographic
boundary files, smooth rates (including spatially), calculate and
visualize global spatial autocorrelation, and calculate inference
for Moran's I using random permutation. The next version of GeoDa
(OpenGeoDa) will be entirely open source, relying on WxWidgets instead
of the proprietary MapObjects.
PySpace:
PySpace is an open source software development effort to implement
spatial statistical methods in general and spatial regression analysis
in particular using Python and Numerical Python. Current activities
focus on a set of classes and methods to carry out diagnostics for
spatial correlation in linear regression models and to estimate
spatial lag and spatial error specifications. In this NCI project,
we have both demonstrated application of the methods to exploration
of cancer mortality rates for counties in the Appalachia Cancer
Network and have developed tools in Jython (an implementation of
the high-level, dynamic, object-oriented language Python written
in 100% Pure Java) that allow us to incorporate PySpace methods
within Java-based applications built using GeoVISTA Studio and GeoTools
components.
Applets:
A series of web applets was built using GeoVISTA
Studio to combine components developed in this project with
those developed previously by the GeoVISTA research team or by developers
at other institutions. These applets illustrate the kinds of health
data analysis that tools being developed can support. While the
applets do not support input of user data, they do allow anyone
who accesses them to explore the data sets we provide (usually datasets
on cancer and related statistics downloaded from the NCI and other
agency web sites).Users can try their own data with the tools by
accessing Studio.
All applets require Java
Plug-in 1.3 or later and Javacript enabled. To view
an applet, click on its respective image. Please respect this software's
licenses (see below).
| |
|
Univariate Map |
|
This map applet features interactive spatial tools and
color selection tools. Users can create a multipart line
through color space to better visualize spatial patterns.
|
|
Author(s): Frank Hardisty |
| |
|
Bivariate Map |
|
Users can create bivariate color scemes that allow users
to visualize spatial patterns between two variables simultaniously.
|
|
Author(s): Frank Hardisty |
| |
|
Parallel Coordinate Plot |
|
Users can see mulitvariate statistical patterns using
this interactive tool. |
|
Author(s): Flo Lederman (Parvis),
Frank Hardisty |
| |
|
Map and Scatterplot Matrix |
|
This image shows a map and scatterplot matrix. The position
and the coloring of the observations is determined by
the variables on the diagonal in the scatterplots, while
in the map, position is geographic, and color is determined
by the variables. Using this tool, we can see attribute
and geographic relationships between multiple variables
at a time. |
|
Author(s): Xiping Dai, Frank Hardisty |
| |
|
Map and Linkgraph |
|
This applet containts a map linked to a node-and-edge
graph representation of U.S. states, where states are
nodes and links between them are determined by the counties
similarity in one or more variables, as found by the minimum
spanning tree in attribute space. We can use this tool
to explore the geographic relationships between similar
observations. |
|
Author(s): Alex Shapiro (Touchgraph),
Frank Hardisty, Diansheng Guo |
| |
|
Subspace Browser |
|
Here the nodes are variables, and the graph shows the
minimum spanning tree based on their correlation coefficients.
This is an easy way to rapidly browse relationships between
many variables. |
|
Author(s): Alex Shapiro (Touchgraph),
Frank Hardisty, Diansheng Guo |
| |
|
| |
Source
code for the above software will be available soon.
All software
licensed under the GNU
Lesser General Public License , with the following exceptions:
The Parallel Coordinate Plot is based on Parvis, making this
code licensed under the GNU
General Public License. The touchgraph based Linkgraph
is Apache licensed.
|
|