Graduate Student Profiles

Sen Xu

Sen XuSen Xu is a Ph.D student in Geography at Penn State, working under the advisement of Dr. Alexander Klippel. Sen is also enrolled in a Ph.D Minor in Computational Science from Department of Aerospace Engineering, College of Engineering, PSU. Last fall, Sen received his Master of Science degree in Geography with his thesis entitled, Exploring regional variation in spatial language - a case study on spatial orientation by using volunteered spatial language data.

As a graduate affiliate of the GeoVISTA Center, Sen contributes to both the GeoCAM and the STempo projects. He specializes in spatial linguistic analyses, exploring the application of various knowledge discovery and data mining approaches. He is also interested in Spatial Cognition, Thematic Cartography, Visualization, Crowd-sourcing, and Knowledge Mining (from the Web). Sen is also a member in Human Factors in GIScience Lab.

About Sen's Thesis Research

Exploring regional variation in spatial language - a case study on spatial orientation by using volunteered spatial language data

Spatial language, such as route directions, is language pertaining to spatial situations and spatial relationships between objects. Spatial language is an important medium through which we study humans’ representation, perception, and communication of spatial information. Existing spatial language studies mostly use data collected via time-consuming experiments, which are therefore limited to a small sample size—thus limiting the detection of how spatial language varies from one region to another. More recently, larger sample sizes have become possible due to the abundance of volunteered spatial language data on the World Wide Web (WWW), such as directions on hotels’ websites.

Sourcing from the WWW, Sen developed a spatial language data collection scheme. Using automated webcrawling, spatial language text document classification based on computational linguistic methods, and geo-referencing of text documents, he built the Spatially-strAtified Route Direction Corpus (the SARD Corpus). Focusing on route directions on the WWW, the SARD Corpus has more than 10,000 spatially distributed documents covering three countries (the United States, the United Kingdom, and Australia). As a case study on the SARD corpus, Sen applied a semantic categorical analysis on cardinal vs. relative direction term usages. He then used the TermTree Tool and Visual Inquiry Toolkit to explore regional linguistic variations in spatial language usages. Sen's analysis showed similarities and differences in directional term usages on national and regional levels. The findings offer knowledge contributions to the field of spatial cognition; the design and implementation of building a geo-referenced large-scale corpus from documents crawled from the WWW offers a methodological contribution to corpus linguistics, spatial cognition, and the GISciences.

Sen's Full Thesis | Slides about Sen's thesis | Poster 1 | Poster 2


Return to Grad Highlights