Penn State GeoVISTA Center

This material is based upon work supported by the National Institutes of Health under Grant # R01 CA95949-01

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institutes of Health or the National Cancer Institute.


The project leveraged GeoVISTA Studio as an environment for building and testing software applications focusing on exploratory spatial data analysis applied to cancer and related risk factor data. A brief summary of software advances follows:

Visual Inquiry Toolkit (VIT): The Visual Inquiry Toolkit is a software application built with existing and new visual and analytical components. Its distinguishing feature (in relation to most previous visual analytics tools) is a capacity to combine advanced visual, computational, and statistical methods to support both data-driven and model-driven analyses, interactively. It supports analysis to identify and analyze spatial, temporal, and high-dimensional clusters, patterns and relationships, at multiple scales. The VIT employs an array of data mining methods (e.g., spatially-constrained hierarchical clustering methods, self-organizing map) and statistical methods (e.g., Moran's I, connections to SaTScan), integrating them together via highly-interactive, dynamically-linked visualization methods (e.g., reorderable matrix, parallel coordinate plots) and graphic user interfaces. The VIT can feed the output from one method (e.g., a model-driven method) to the other methods (e.g., data-driven ones), facilitating quick hypothesis generation and verification. The VIT also supports saving intermediate findings during data analysis, thus enabling incremental analysis through multiple sessions (i.e., weeks, months). The VIT is a component-based framework that can be customized to support studies for different research topics and domains.

The GeoViz Toolkit: Like the Visual Inquiry Toolkit, the GeoViz Toolkit is partially derived from GeoVISTA Studio code. A core goal in developing GeoViz was to make it easier for end users to begin analyzing their data. It accomplishes this through a flexible user interface that allows non-programmers to assemble their own custom applications quickly and by supporting standard data formats to make data input easier. The GeoViz Toolkit includes versions of the RadViz, PROCLUDE, and Moran's I tools described below along with a wide range of mapping, statistical graphics, and other analytical methods.

RadViz: The RadViz component in GeoVISTA Studio is an interactive tool for visualizing highly multivariate geographical datasets. It is based on the RADVIZ visualization method (introduced by Ankerst and colleagues {Ankerst, 1998 #10554}) that maps data points in a high dimensional data space to a two dimensional data space. The addition of RadViz to GeoVista Studio enhances Studio's capability of multivariate data analysis and visualization. The RadViz tool is implemented in a way that facilitates interactive exploration of multivariate geographical datasets. Its integration into the GeoViz Toolkit makes this novel analytical method easy to link with other tools and more accessible to users.

PROCLUDE: PROCLUDE (PROgrams for CLUster DEtection) is a toolkit application containing multiple spatial cluster detection routines. As part of the grant reported here, a new genetic algorithm-based cluster detection method was developed and implemented in PROCLUDE. PROCLUDE also contains new Java-based implementations of three established cluster detection methods: Openshaw's Geographical Analysis Machine (GAM), case point centered searching (proposed by Besag and Newell), and randomized GAM (proposed by Fotheringham and Zhan). In a related effort, an approach to using genetic algorithms to configure the "flock of boids (bird objects)" method of cluster detection has been implemented and tested. As noted above, PROCLUDE has been integrated into the GeoViz Toolkit, making it possible to explore its output using the full range of GeoViz tools.

G-EX Portal: The Geovisual EXplication (G-EX) Portal is a web-based geocollaboration and resource sharing tool that supports the use of the aforementioned ESDA and geovisualization tools and the interpretation of results derived from these tools (see figures above). Prior to analysis, users can access the G-EX Portal to download software and related learning materials (e.g., training manuals, multimedia tutorials, example problems). Following analysis, users can then upload artifacts generated during analysis sessions and collaborate about them asynchronously.

GeoDa: GeoDa is a suite of exploratory spatial data analysis (ESDA) tools developed at the University of Illinois by Co-PI Anselin, primarily as part of the NSF-funded Center for Spatially Integrated Social Science. In this NCI project, we have converted spatial analytical methods from GeoDa into open-source Java components for integration with other open-source components within GeoVISTA Studio. Converted methods include ones to: construct contiguity information from geographic boundary files, smooth rates (including spatially), calculate and visualize global spatial autocorrelation, and calculate inference for Moran's I using random permutation. The next version of GeoDa (OpenGeoDa) will be entirely open source, relying on WxWidgets instead of the proprietary MapObjects.

PySpace: PySpace is an open source software development effort to implement spatial statistical methods in general and spatial regression analysis in particular using Python and Numerical Python. Current activities focus on a set of classes and methods to carry out diagnostics for spatial correlation in linear regression models and to estimate spatial lag and spatial error specifications. In this NCI project, we have both demonstrated application of the methods to exploration of cancer mortality rates for counties in the Appalachia Cancer Network and have developed tools in Jython (an implementation of the high-level, dynamic, object-oriented language Python written in 100% Pure Java) that allow us to incorporate PySpace methods within Java-based applications built using GeoVISTA Studio and GeoTools components.


A series of web applets was built using GeoVISTA Studio to combine components developed in this project with those developed previously by the GeoVISTA research team or by developers at other institutions. These applets illustrate the kinds of health data analysis that tools being developed can support. While the applets do not support input of user data, they do allow anyone who accesses them to explore the data sets we provide (usually datasets on cancer and related statistics downloaded from the NCI and other agency web sites).Users can try their own data with the tools by accessing Studio.

All applets require Java Plug-in 1.3 or later and Javacript enabled. To view an applet, click on its respective image. Please respect this software's licenses (see below).

Univariate Map
Univariate Map This map applet features interactive spatial tools and color selection tools. Users can create a multipart line through color space to better visualize spatial patterns.
Author(s): Frank Hardisty
Bivariate Map
Bivariate Map Users can create bivariate color scemes that allow users to visualize spatial patterns between two variables simultaniously.
Author(s): Frank Hardisty
Parallel Coordinate Plot
Parallel Coordinate Plot Users can see mulitvariate statistical patterns using this interactive tool.
Author(s): Flo Lederman (Parvis), Frank Hardisty
Map and Scatterplot Matrix
Map and Scatterplot Matrix This image shows a map and scatterplot matrix. The position and the coloring of the observations is determined by the variables on the diagonal in the scatterplots, while in the map, position is geographic, and color is determined by the variables. Using this tool, we can see attribute and geographic relationships between multiple variables at a time.
Author(s): Xiping Dai, Frank Hardisty
Map and Linkgraph
Map and Linkgraph This applet containts a map linked to a node-and-edge graph representation of U.S. states, where states are nodes and links between them are determined by the counties similarity in one or more variables, as found by the minimum spanning tree in attribute space. We can use this tool to explore the geographic relationships between similar observations.
Author(s): Alex Shapiro (Touchgraph), Frank Hardisty, Diansheng Guo
Subspace Browser
Subspace Browser Here the nodes are variables, and the graph shows the minimum spanning tree based on their correlation coefficients. This is an easy way to rapidly browse relationships between many variables.
Author(s): Alex Shapiro (Touchgraph), Frank Hardisty, Diansheng Guo

Source code for the above software will be available soon.

All software licensed under the GNU Lesser General Public License , with the following exceptions: The Parallel Coordinate Plot is based on Parvis, making this code licensed under the GNU General Public License. The touchgraph based Linkgraph is Apache licensed.