Marquette Researchers Earn Recognition for Adaptive Spatial Partitioning Algorithm
The Northwestern Mutual Data Science Institute would like to recognize Dr. Satish Puri, Assistant Professor in Marquette University’s Computer Science Department, and his graduate student, Jie Yang, for the publication of their paper Efficient Parallel and Adaptive Partitioning for Load Balancing in Spatial Join at the International Symposium on Parallel and Distributed Processing. The conference acceptance rate was 24.7% (110 research papers selected out of 446 submissions).
Puri recently sat down with the NMDSI to explain his work on operationalizing spatial join using geospatial data and the inherent challenges of storing and retrieving geospatial data.
Dr. Puri’s research focus is applying computer science to geospatial problems. Simply put, geospatial data contains location information, anything that can be mapped using latitude and longitude coordinates. For example, whenever you use your smart phone for maps or if you tweet and allow location tracking, then geospatial data is involved.
The question that Puri and Yang asked was how do you combine two different data sets that are geospatial in nature efficiently? “Combining the different datasets has value because you can extract new information. For instance, on one map there are all the roads in the United States, and a second map contains rivers. If you overlay or join these maps to create one map, you would get a spatial combination of datasets that can provide new insights. The problem is that existing software for Big Spatial Data is not fast enough to handle this request in real time.”
One way to increase the speed is by load balancing and partitioning the work. For instance, Google searches are executed on hundreds of machines working together with the results being compiled and sent to the user. Puri says that by distributing work across multiple processors or CPUs, processing times can be decreased allowing real time interaction.
Given that spatial information is often in terabytes, combining different datasets together to extract meaningful information becomes very data intensive. Relying on Satish’s research expertise in high performance computing, and parallel and distributed computing, the researchers distributed the spatial join workload across 4,000 CPU cores of Bridges cluster at Pittsburgh Supercomputing Center.
Using their newly developed adaptive spatial partitioning algorithm, they tested their approach using a pair of real-world datasets, one with 700 million geometries and another with 10 million polygons from roads and parks data. The outcome was that their scalable algorithm returned results in 7 seconds, showing a decrease in run time compared to previous “best” versions.
Both researchers are pleased with the results of their work and hope to explore their findings in different ways. Puri continues to teach at Marquette University with research interests now expanded to include spatial big data while Yang is now pursuing a doctoral program in Computer Science. Their full paper can be accessed online.