logos
Penn State GeoVISTA Center
 

This material is based upon work supported by the National Institutes of Health under Grant # R01 CA95949-01

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Institutes of Health or the National Cancer Institute.

  Featured Research

Reseach Context:

Few cancer types are evenly distributed across a population. Geographic variations in cancer mortality have been associated with risk factor prevalence, screening behaviors, health care access and utilization, genetic predisposition and occupational hazards. Consequently, mapping cancer records (and other health statistics) has been one important catalyst for developing hypotheses about cancer etiology. This research has helped to identify flaws in cancer surveillance networks, and to develop effective cancer control policy.

Geographic Information Systems have extended the promise of health mapping since they can integrate heterogeneous data from diverse sources. However, the capabilities of current GIS (even when linked with other spatial analysis methods and tools) may limit, and bias, the geographic observations that cause investigations to be initiated. The risk remains that geographic variations in cancers may be falsely observed, go unrecognized, or that relationships between cancers and risk or preventive factors may be misunderstood.

Primary Goals:

  1. To design, implement, and integrate a suite of geovisualization, exploratory spatial data analysis (ESDA), and computational software components targeted to applications in cancer research, surveillance, and control. We will leverage an existing software platform, GeoVISTA Studio (described below) that supports independent development of cancer data analysis software components (by our research team as well as by others). To this we will add exploratory statistical and visualization components that can be combined in a highly coordinated manner using a flexible visual interface. When integrated through Studio, the created software applications will support the entire scientific process from data exploration and hypothesis generation, through rigorous analysis of hypotheses, to presentation of findings in accessible ways (months 1 - 36).
  2. To improve methodologies for exploratory research in cancer epidemiology and etiology. Exploratory analysis contains a high potential for finding spurious associations and missing real associations, thereby introducing errors and misinterpretations. We propose research targeted at improvements in two areas: (a) development of sound data exploration methodologies and practices and (b) development of tools to assess and understand the validity and reliability of the results. This latter goal will draw on advancements, derived from the machine learning and data mining communities, in searching through vast 'hypotheses spaces' and from research on visualizing data reliability (months 25 - 54).
  3. To ensure that the developed methods and components (a) are usable by and meaningful to professionals engaged in cancer control and (b) can be used accurately to address important questions in cancer research, surveillance and control. These goals will be accomplished through two linked activities. First, formal usability assessment methods will be applied throughout the process of tool design, implementation, and deployment (months 1-60). Second, we will carry out proof-of-concept applications to case studies focused on cancer research and policy. These case study applications will address important questions related to accurate assessment of geospatial patterns in cancer and will provide an opportunity to assess both usability and usefulness of tools in realistic applications (months 25-60).

Software Development:

The project leverages GeoVISTA Studio--a Java-based software environment for building userspecific computer applications, being developed by the GeoVISTA Center. Visual components of Studio fuse geographic visualization, statistical graphics, and information visualization. Statistical components focus on ‘local’ spatial statistics. Computational components will provide pattern recognition and searching across large data sets. Studio is open-source software with a growing developer/user base. See our web site for some early examples of methods developed for health and demographic data analysis.