.

 

Temporal GIS

Parallel Computing

Visualization

Parallel Computing

Several GIS applications are characterized by the vast amount of information that needs to be stored, retrieved and analyzed. In addition to just being able to handle these large data sets, a GIS should also be able to perform queries on this data efficiently to meet certain real-time constraints. As a result, there are three main requirements for a GIS to be successful in handling the demands of current and emerging applications:

 

1. The GIS must be able to store large data sets.

2. The GIS must be able to store, retrieve and process these large data sets efficiently.

3. The GIS must provide low response times and high throughput for complex queries on the data sets.

 

Focussing specifically on the R-Tree, a common data structure used to store spatial information, we have proposed different techniques for distributing this data structure across the workstations of a NOW platform. We have implemented a client-server architecture to evaluate these distribution strategies experimentally on a cluster of UltraSPARC model 170 workstations connected by Ethernet and Myrinet. Using this framework, we have experimentally studied the performance of two distribution schemes and shown that there can be significant performance improvements with the use of parallelism for inserts and spatial search operations.

Future Directions

In the near future, we are planning to prototype other distribution strategies, develop a model to estimate the performance and efficiency of these schemes, and design and implement algorithms for spatial join operations for the different distributions. Following this, we plan to extend this approach to other data structures (such as those for storing temporal information), and study issues in implementing a parallel GIS supporting these different data structures and a range of queries.

To meet these requirements, a GIS must employ a high performance computer system. Conventional platforms for GIS have used high performance Input/Output (I/O) subsystems (to store the large repositories of data) that are attached to a high performance workstation. However, despite the I/O parallelism offered by some of these systems (such as RAID), the channel between the processing center and I/O system can itself become a bottleneck, limiting the speed of data transfer. Further, such an architecture does not provide any additional computational power for executing complicated queries beyond the raw processing power of the native workstation. This observation leads us to believe that a balanced high performance platform for a GIS should support parallelism in processing (CPUs), primary (memory) and secondary (disk) storage, as well as I/O channels. Recent trends in computer architecture show that a Network of Workstations (NOW) is emerging as a cost-effective solution to high performance computing. The term, workstations, is used rather loosely in this context and includes high performance personal computers as well. Such systems are commercially more viable since they use cost-effective off-the-shelf components compared to custom-built parallel machines. Before these systems can be used to develop a high performance GIS, there are several open research issues to be addressed in harnessing the full capabilities of such a platform, and this research takes a step towards this goal.

.Contact Us

If you would like to read more about our efforts, please download the paper

Title: Evaluation Parallel R-Tree Implementations on a Network of Workstations.

Authors: Ning An, Liujian Qian, Anand Sivasubramaniam, Tom Keefe

Date: 1998

Abstract: It is becoming increasingly important that a Geographical Information System delivers high performance to efficiently store, retrieve and process the voluminous data that it nccds to handle. It is necessary to employ processing and storage parallelism for scalable long-term solutions. With the demise of many custom-built parallel machines, it is imperative that we use off-the-shelf technology to provide this parallelism. A closely-coupled network of workstations is a viable altemative.

In this paper, we explore techniques for distributing a spatial data structure (R-tree) across a network of workstations. We provide a framework to explore design altematives in distributing the R-tree across workstations. We also develop an extensive system to implement and evaluate these altematives. Specifically, we show this by implementing two distribution schemes, and evaluating their perfommance for insert and spatial search operations on two different data sets.

Publication: Technical Report CSE-98-006, May 1998.

Keywords: Geographic Information Systems, Parallel Processing, Spatial Data Structures, Networks of Workstations

Full Paper...


.

 

Copyright © 1998-2000 Pennsylvania State University